bpo-24905: Support BLOB incremental I/O in sqlite module by palaviv · Pull Request #271 · python/cpython
The APSW doc for reference is at https://rogerbinns.github.io/apsw/blob.html
Does having len make sense? Files don't have that method. It is also confusing - should len return the size from the current seek offset?
The documentation should make clearer that you cannot change the size of a blob, and mention zeroblob as the means to make a blob in a query without having to fill it in.
It may be worth mentioning that another approach is to store large data in a file, and only store the filename in the database. (This comes up on the sqlite-users mailing list quite a lot.)
I cannot remember I ever was in need to read/write a part of a BLOB; it was always "all or nothing" for me. So I never used BLOB APIs; instead I always SELECT/INSERT/UPDATE BLOB columns; in Postgres they are not even BLOB columns — I always use BYTEA type.
So I'm -0 on exposing BLOB API for SQLite.
@phdru SQLite is the same with regular queries: you can only read or write blobs in their entirety. That for example means that if you store a 25MB blob then you must read or write 25MB at once.
SQLite has the "incremental blob" API for accessing just portions of blobs. The motivation comes from "Lite" in the name - developers use SQLite because it is lighter weight (amongst other reasons). DBAPI doesn't specify incremental blob I/O so only developers intending to use SQLite directly and not another database would use it. Should they be able to?
Thanks for the input @rogerbinns.
Does having len make sense? Files don't have that method. It is also confusing - should len return the size from the current seek offset?
What is the difference between implementing __len__ to the method length APSW blob has?
@palaviv there is no difference between the value returned by len and length or similar methods. It is however very uncommon to have a len method on file like objects - I couldn't find an example of any! For example StringIO is closest and has no len. Hence my recommendation to avoid len in favour of another method name.
@serhiy-storchaka good example. They don't document it though, and there is a size() method although it is returning something slightly different. There also seems to be a correlation between types that have len and those that can you can array access.
In any event my recommendation is to avoid breaking new ground with a len method since that seems not to be normal practise for this kind of thing that provides a file like interface.
I actually think that we should use __len__ as by the definition this is the length of the object. The Blob object is a representation of the BLOB and that is the BLOB length.
Pull request conversation is purposed for discussing the code.
It would be better to continue the design discussion on the bug tracker or mailing list.
@serhiy-storchaka I have implemented the sequence protocol but I have a few questions:
- Do I need both
PySequenceMethods.sq_item,PySequenceMethods.sq_ass_itemandPyMappingMethods.mp_subscript,PyMappingMethods.mp_ass_subscript. - I can't make
__contains__work. Could you point me to how fix that?
I think that the contains operation should not be supported for blobs. As blobs can be very large looking for a subset of bytes inside them will be a very inefficient process in memory or in compute.
| The BLOB size cannot be changed using the :class:`Blob` class. Use | ||
| ``zeroblob`` to create the blob in the wanted size in advance. | ||
|
|
||
| .. versionadded:: 3.7 |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3.8
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| Blob Objects | ||
| ------------ | ||
|
|
||
| .. versionadded:: 3.7 |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3.8
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can retarget this for py 3.9
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retarget for python 3.9
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3.9
Hi @palaviv There is some plan with this PR?
Was open 3 years ago. Are you still interest on this patch?
If yes, could you fix the conflict. please?
Hi @eamanu,
This patch exist since January 2016 and I kind of given up on it ever going into CPython as there is no core developer that works on sqlite. I would recommend you to use apsw that support this feature. In case any core developer would be in interested in working on this I would gladly fix any needed changes.
I'd love to see this land in Python. I think there's a strong case for it: SQLite lets you store up to 2GB of data in a BLOB, and reading an entire 2GB value into memory at once isn't nearly as pleasant as reading it incrementally, which is what this would let us do.
Thanks for the review @berkerpeksag. I have made the requested changes; please review again.
Thanks for making the requested changes!
@berkerpeksag: please review the changes made to this pull request.
Other than rebasing (due new conflicts arising over time), is there anything that can be done to help move this PR along?
(@palaviv do you want to do the rebase? if you'd like or are too busy I can do the rebase, though I'd need to open a new PR since I don't think I can modify yours)
@nightlark, @palaviv: Here' a short list from the top-of-my head of what is needed to rebase this onto main:
- the test suite has been normalized; we now use snake case
test_foo_barmethod names - Argument Clinic
- use heap types iso. static types
- exception types are accessed through the (temporary) global state; for Connection objects, it's available through
self->state
If you want to try to land this, Ryan, please give Aviv a week or so to respond before opening a new PR :)
@erlend-aasland Okay — I think I understand how to use argument clinic. Is there a guide to what iso. static types (or heap types)? Is the iso. a prefix for the types or an abbreviation? If it’s an abbreviation maybe that’s the missing a search term I should be using to find relevant resources.
erlend-aasland pushed a commit to erlend-aasland/cpython that referenced this pull request
erlend-aasland pushed a commit to erlend-aasland/cpython that referenced this pull request
I just merged #30680, a simplified version of this PR. Blobs will be in Python 3.11.
This was referenced