LargeBinary should truly be unlimited in MySQL

Issue #4177 wontfix
NameZero912
created an issue

The docs mention "...a large and/or unlengthed binary type for the target platform, such as BLOB on MySQL". Using BLOB on MySQL doesn't make sense (as it is limited to 64 kb). LONGBLOB seems to be a better choice (4 GB limit).

Comments (6)

  1. Michael Bayer repo owner

    reasons against this change:

    1. it would break all Alembic migrations for projects that use LargeBinary, all of them would need to add a permanent rule to their autogenerate to adjust for this difference (definitely disqualifies 12 milestone at the very least).

    2. LONGBLOB is not actually very usable with Python, since AFAIK the drivers buffer the entire value of a column in memory on both the read and write side, e.g. there is no streaming. Inserting and reading an actual 4G value is not really practical. If the drivers like mysqlclient or pymysql have means of streaming (I don't think they do, but feel free to provide resesarch for this) this would still imply a major API change as you would need to stream at every level, indicating you wouldn't just have a generic LargeBinary, you'd have StreamingBinary() or something like that.

    3. the different BLOB types have different size requirements, LONGBLOB takes up three bytes more than BLOB in the row. A small number, but I would estimate that the overwhelming vast majority of BLOB usage is well within the 64K limit.

    4. if you need a bigger BLOB with your LargeBinary this is available via the API using variant: LargeBinary().with_variant(mysql.LONGBLOB(), "mysql") . Use of the with_variant() method is common for applications that target MySQL since MySQL's datatypes have so many tuning parameters.

    Leaning towards "wontfix" however feel free to provide detail on how 4G values, or even 16M MEDIUMBLOB values, can be read and persisted pratically given current drivers. I haven't looked into it in a long time.

  2. NameZero912 reporter

    Thank you for your detailed response. I'm not into driver development, so I cannot provide any further insights there. I was actually using PickleType, which internally uses LargeBinary. I was able to simply adapt the implementation via a monkeypatch: PickleType.impl = LargeBinary().with_variant(mysql.LONGBLOB(), "mysql")

    I'd also be inclined to mark it as "won't fix". However, it would make sense to mention the size limitation in the documentation, at least for PickleType. Pickled data can easily exceed the 64 kb limit.

  3. Michael Bayer repo owner

    so MEDIUMBLOB might be a better default here since while it goes up to the less practical limit of 16M, it does accommodate for the whole world of htings > 64K, and 64K, while it is the entire memory available of my Atari 800, is quite small by todays standards :)

  4. Michael Bayer repo owner

    unfortunately while MEDIUMBLOB might have been a better choice, at this point I don't have the resources to attend to the backwards compatibility issues this will raise.

  5. Log in to comment