◐ Shell
reader mode source ↗
Skip to content

bpo-15873: Implement [date][time].fromisoformat#4699

Merged
abalkin merged 21 commits into
python:masterfrom
pganssle:from_isoformat
Dec 21, 2017
Merged

bpo-15873: Implement [date][time].fromisoformat#4699
abalkin merged 21 commits into
python:masterfrom
pganssle:from_isoformat

Conversation

@pganssle

@pganssle pganssle commented Dec 4, 2017

Copy link
Copy Markdown
Member

Per discussion on the python-dev mailing list, this is a C implementation of fromisoformat as alternate constructors for datetime, date and time, resolving this issue.

If it is deemed desirable, it can be extended later to cover all ISO-8601 datetime strings, but I believe the consensus so far is to have a minimum implementation that only covers the outputs of datetime.isoformat().

One thing I'd like to call attention to is that my profiling seems to indicate that in the common case (i.e. not calling this from a subclass), using the C API directly is much faster than going through PyObject_CallFunction.

Here is a profiling script:

from datetime import datetime
from datetime import timezone, timedelta
from timeit import default_timer as timer

N = 100000
tzi = timezone(timedelta(hours=4))
tzi = None
comps = (2014, 3, 25, 4, 17, 30, 204300, tzi)
dt = datetime(*comps)

s = timer()
for i in range(N):
    dt_c = datetime(2014, 3, 25, 4, 17, 30, 241300, tzinfo=tzi)
e = timer()

tt = 1000000000 * (e - s) / N
dtstr = dt.isoformat()

s = timer()
for i in range(N):
    dt_fi = datetime.fromisoformat(dtstr)
e = timer()
tt2 = 1000000000 * (e - s) / N

print('datetime constructor: {:0.1f}ns'.format(tt))
print('fromisoformat:       {:0.1f}ns'.format(tt2))

Result on my laptop:

(tzi is None):
datetime constructor: 2294.1ns
fromisoformat:       944.5ns

(tzi = timezone(timedelta(hours=4)))
datetime constructor: 2229.1ns
fromisoformat:       1400.5ns

If there's no particularly pressing reason why all the other alternate constructors universally go through the main constructor call, I could write a small function or macro that would take the fast path if available and use it for all the alternate constructors.

CC @abalkin @mariocj89

https://bugs.python.org/issue15873

@pganssle pganssle changed the title Implement [date][time].fromisoformat (GH-15873) Dec 4, 2017
@pganssle

pganssle commented Dec 4, 2017

Copy link
Copy Markdown
Member Author

One other thing I'll note - the pure python implementation here is at least partially optimized for speed. There is a fairly trivial implementation that looks, more or less, like this:

from datetime import datetime 

_base_strptimes = {
    10: '%Y-%m-%d',
    13: '%Y-%m-%dT%H',
    16: '%Y-%m-%dT%H:%M',
    19: '%Y-%m-%dT%H:%M:%S',
    23: '%Y-%m-%dT%H:%M:%S.%f',
    26: '%Y-%m-%dT%H:%M:%S.%f'
}

def from_strptime(dtstr):
    if not isinstance(dtstr, str):
        raise TypeError('isoformat takes str')

    if len(dtstr) >= 19 and dtstr[-6] in '+-':
        fmt = _base_strptimes[len(dtstr) - 6] + '%z'
    else:
        fmt = _base_strptimes[len(dtstr)]

    return datetime.strptime(dtstr, fmt)

This will work, but strptime is actually fairly slow, and in my benchmarks the pure python implementation here is twice as fast as the strptime-based method.

@pganssle pganssle changed the title [bpo-15873] Implement [date][time].fromisoformat Dec 4, 2017
@bedevere-bot

Copy link
Copy Markdown

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@pganssle pganssle force-pushed the from_isoformat branch 2 times, most recently from 39a65c3 to df3c032 Compare December 5, 2017 01:53
@pganssle

pganssle commented Dec 5, 2017

Copy link
Copy Markdown
Member Author

@abalkin @vadmium I think I've fixed all the concerns raised so far - some good catches in there. There are three documentation-related concerns:

  1. Per @vadmium, although there are tests that ensure that this and other alternate constructors have heritable behavior (e.g. MyDateTimeClass.fromisoformat(dt.isoformat()) should return a MyDateTimeClass), this is not explicitly documented. I think we could add a note at the class level like, "Alternate constructors called on datetime subclasses will return an instance of the subclass" - though to some extent that feels like it's already part of the implicit contract (considering it's a classmethod and not a staticmethod).

  2. I do not explicitly mention the fact that +00:00 and -00:00 will attach timezone.utc instead of timezone(timedelta(0)). I think this is almost certainly the correct behavior, but it may be somewhat surprising and I could easily see dropping this special-casing of the zero-offset zone. Presumably I should at least document this?

  3. Per some discussions on the issue, this is explicitly the "minimum viable feature set". Currently it's true that datetime.fromisoformat will parse a string if and only if that string can be generated by datetime.isoformat. Do we want to put something in the documentation equivalent to "don't rely on this to validate whether or not something came from datetime.isoformat, because we're only agreeing to guarantee that it will parse anything that datetime.isoformat can generate, and we reserve the right to accept other formats"?

@mariocj89

Copy link
Copy Markdown
Contributor

To fully support isoformat output you also need to parse seconds on the timezone:

datetime.datetime.now(datetime.timezone(datetime.timedelta(seconds=30))).isoformat()
'2017-12-06T12:28:38.567161+00:00:30'

@pganssle

pganssle commented Dec 6, 2017

Copy link
Copy Markdown
Member Author

@mariocj89 Thanks for the catch. The PR is updated with tests and implementation. Didn't realize that second precision was allowed in isoformat() in Python 3.7.

@abalkin

abalkin commented Dec 7, 2017

Copy link
Copy Markdown
Member

Didn't realize that second precision was allowed in isoformat() in Python 3.7.

Moreover, the sub-second precision is now allowed as well:

>>> from datetime import *
>>> import random
>>> datetime.now(timezone(timedelta(hours=24*random.random()))).isoformat()
'2017-12-08T11:34:30.238049+15:47:34.297540'

The good news is that you should be able to reuse the time fields parser logic to parse the timezone.

cc: @pganssle

@pganssle

pganssle commented Dec 9, 2017

Copy link
Copy Markdown
Member Author

@abalkin Unless I'm doing something wrong, it seems that the pure python implementation of isoformat() doesn't actually support subsecond precision.

import sys 
sys.modules['_datetime'] = None     # cause ImportError
from datetime import *

print(time(12, 30, 15, tzinfo=timezone(timedelta(microseconds=123456))).isoformat())
# 12:30:15+00:00:00

print(datetime(2017, 1, 1, 12, 30, 15,
      tzinfo=timezone(timedelta(microseconds=123456))).isoformat())
# Traceback (most recent call last):
#  File "<stdin>", line 2, in <module>
#  File ".../cpython/Lib/datetime.py", line 1832, in isoformat
#   assert not ss.microseconds
# AssertionError

I suppose I'll fix that as part of this PR.

@pganssle

pganssle commented Dec 9, 2017

Copy link
Copy Markdown
Member Author

Latest changes fix both isoformat() and fromisoformat() in pure python and C implementations for subsecond offsets.

35 hidden items Load more…
@abalkin

abalkin commented Dec 13, 2017

Copy link
Copy Markdown
Member

it seems that the pure python implementation of isoformat() doesn't actually support subsecond precision.

@pganssle - It looks like you are right. Please mention bpo-5288 in the commit message when you fix this. Please also find out why this was not caught by the test suit. We should be running all tests with and without C acceleration.

@pganssle

Copy link
Copy Markdown
Member Author

@abalkin It's already fixed and tests added. I didn't see any tests actually testing subsecond support for timezones in .isoformat, which is why it wasn't caught by the tests.

I just pushed a rewritten history that mentions bpo-5288 in the commit fixing the pure python version.

@abalkin

abalkin commented Dec 13, 2017

Copy link
Copy Markdown
Member

@pganssle - great job! This PR looks good to me now and I will merge it in a few days to give others a chance to review.

@pganssle

pganssle commented Dec 18, 2017

Copy link
Copy Markdown
Member Author

@vadmium Thanks for all the comments. I've added a test for the ambiguous cases you identified and updated the documentation (plus dropped the unused sep parameter).

@pganssle

Copy link
Copy Markdown
Member Author

@abalkin @vadmium Any thoughts on the question of whether to add a sep parameter to fromisoformat (from this now-hidden comment)?

The only reason to do so would be if in a future release we want to support ISO-8601 style strings that are just dates with offsets attached (e.g. 2017-01-01+12:00). I'm not sure, but I think ISO-8601 makes time zone offsets a property of time, so such dates-with-offsets are probably not actually valid ISO-8601 strings. Because 2017-01-01+12:00 could have been generated from datetime.isoformat(sep='+'), in its current form it would be impossible to get fromisoformat to interpret that +12:00 as anything except the time component. If we make the default behavior "strict", then in the future we would be able to support these "out of the box", specifying the separator as + or - would cause these style strings to be interpreted as times, and otherwise they would be offsets.

Having given this a bit of thought, I think it's probably a good idea to leave out sep, and if we want to support this use case in future releases, specifying a separator other than + or - would allow these things to be parsed as intended. Having the default value ignore the separator will make it easier to handle what is probably the most frequent use cases, which will be sep=' ' and sep='T' out of the box without any sort of branching to detect which one you're in, etc.

If we assume that some fraction of people will want to be strict and some people will want to be lenient, no matter which one we choose, the other behavior can easily be emulated. If we choose lenient by default, strict users can do the equivalent with:

def strict_fromisoformat(dtstr, sep='T'):
    if dtstr[10:11] != sep:
        raise ValueError('Invalid isoformat string: %s' % dstr)
    return datetime.fromisoformat(dtstr)

If we choose strict by default, the "lenient by default" option is:

def lenient_fromisoformat(dtstr):
    # Assuming sep requires a single character be passed, for dates it can be anything
    return datetime.fromisoformat(dtstr, sep=dtstr[10:11] or 'T')

Anyway, this is a kind of long comment just to say, "I think we should keep the status quo", but I thought I'd at least outline my thinking.

@abalkin

abalkin commented Dec 19, 2017

Copy link
Copy Markdown
Member

Any thoughts on the question of whether to add a sep parameter to fromisoformat

-1 or adding a parameter, but I would not mind having fromisoformat() not checking the separator at all.

@pganssle

Copy link
Copy Markdown
Member Author

@abalkin The current behavior is to allow any single character for the separator at all, if that's what you mean.

@abalkin

abalkin commented Dec 19, 2017

Copy link
Copy Markdown
Member

The current behavior is to allow any single character for the separator at all, if that's what you mean.

Yes. That's what I want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants