◐ Shell
reader mode source ↗
Skip to content

bpo-24665: double-width CJK chars support for textwrap#89

Closed
fgallaire wants to merge 8 commits into
python:masterfrom
fgallaire:master
Closed

bpo-24665: double-width CJK chars support for textwrap#89
fgallaire wants to merge 8 commits into
python:masterfrom
fgallaire:master

Conversation

@fgallaire

@fgallaire fgallaire commented Feb 14, 2017

Copy link
Copy Markdown
  • Add ckj option flag, default to False
  • Add cjk_wide(), cjk_len() and cjk_slices() utilities

* Add ckj option flag, default to False
* Add cjkwide(), cjklen() and cjkslices() utilities
@the-knights-who-say-ni

Copy link
Copy Markdown

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA. This is necessary for legal reasons before we can look at your contribution. Please follow these steps to help rectify the issue:

  1. If you don't have an account on b.p.o, please create one
  2. Make sure your GitHub username is listed in "Your Details" at b.p.o
  3. If you have not already done so, please sign the PSF contributor agreement
  4. If you just signed the CLA, please wait at least one US business day and then check "Your Details" on bugs.python.org to see if your account has been marked as having signed the CLA (the delay is due to a person having to manually check your signed CLA)
  5. Reply here saying you have completed the above steps

Thanks again to your contribution and we look forward to looking at it!

@fgallaire

fgallaire commented Feb 14, 2017

Copy link
Copy Markdown
Author

I now have listed my GitHub username at bpo and i had already signed the CLA.

@fgallaire fgallaire force-pushed the master branch 2 times, most recently from e52db22 to 5c4b2af Compare February 14, 2017 09:26

@vstinner vstinner left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

You need to document the name option in Doc/library/textwrap.rst, don't forget ".. versionchanged:: 3.7" markup.

@vstinner

Copy link
Copy Markdown
Member

By email I proposed a different design. Make existing TextWrapper extensible: add text_width() and truncate() methods. Then add a new TextWrapperCJK which overrides these methods. It would allow to reuse the code for other cases than just CJK.

@vstinner

Copy link
Copy Markdown
Member

Your change breaks Python build: Python requires optparse to compile modules like unicodedata, optparse imports textwrap which now always requires unicodedata.

Using two different classes, it would be simpler to only import unicodedata when the TextWrapperCJK class is instanciated, "on demand", and so fix the bootstrap issue.

@fgallaire

fgallaire commented Feb 15, 2017

Copy link
Copy Markdown
Author

What about a depreciation warning to inform that cjk default will switch to True in Python 3.8 ?

@fgallaire

fgallaire commented Feb 15, 2017

Copy link
Copy Markdown
Author

optparse is deprecated since Python 3.2 so it should not drive this work, but a port to argparse will not solve this problem because of the same cycling dependency.

@codecov

codecov Bot commented Feb 15, 2017

Copy link
Copy Markdown

Codecov Report

❗ No coverage uploaded for pull request base (master@984eef7). Click here to learn what that means.
The diff coverage is 72.5%.

@@            Coverage Diff            @@
##             master      #89   +/-   ##
=========================================
  Coverage          ?   82.38%           
=========================================
  Files             ?     1428           
  Lines             ?   351193           
  Branches          ?        0           
=========================================
  Hits              ?   289333           
  Misses            ?    61860           
  Partials          ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 984eef7...54de7aa. Read the comment docs.

@methane methane left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

While I agree supporting east_asian_wdith is preferable, I don't like current API.
It's OK for third party library. But for standard library, I prefer more generic API and implementation.

While "practical beats purity", I want glibc's wcwidth for UTF-8 at least. (bonus: option like ambiwidth=double in vim)

@yan12125

Copy link
Copy Markdown

I want glibc's wcwidth for UTF-8 at least

Disagreed. I guess text editors and terminals are top two applications in textwrap. I known little about text editors. In terminals, things are quite interesting. Different terminals use different versions of EastAsianWidth.txt. Here are some examples:

  • tmux, QTerminal: use wcwidth() by default. In the latest glibc, it's still Unicode 8.0. [1] On Mac, it's reported that wcwidth() is broken [2].
  • VTE (gnome-terminal, xfce4-terminal, etc.): Unicode 9.0 [3]
  • Konsole: Unicode 5.0 [4]

Different Unicode versions can lead to different results for textwrap. Take U+231A (clock emoji) for example. In Unicode 8.0 it's NEUTRAL. Most implementations see it as width=1. In Unicode 9.0 it's changed to WIDE.

In CPython, handling multiple Unicode versions sounds impractical. IMO chasing the latest Unicode 9.0 is a good idea. If a terminal emulator is not compatible with Unicode 9.0, it should be fixed.

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=20313
[2] tmux/tmux#515
[3] https://bugzilla.gnome.org/show_bug.cgi?id=771591
[4] https://github.com/KDE/konsole/blob/master/src/konsole_wcwidth.cpp#L55

@yan12125

Copy link
Copy Markdown

Also, I would recommend rename cjk_foobar stuffs to unicode_foobar. CJK characters occupy a large portion on Unicode tables but not the whole.

@methane

methane commented Feb 16, 2017

Copy link
Copy Markdown
Member

@yan12125 sorry, my wording was wrong. (I'm not good at English).
When I said "I want glibc's wcwidth for UTF-8 at least", I meant
"if textwrap supports display width, I think it should be good as wcwidth at least.".
I didn't meant neither "textwrap should use wcwidth" nor "textwrap should implement algorithm
exactly same to wcwidth."

BTW, pull request is not place for discussion like this.
Please go to http://bugs.python.org/issue24665

18 hidden items Load more…
akruis pushed a commit to akruis/cpython that referenced this pull request Sep 9, 2017
…er shutdown code

Fix the implementation to work as documented if a thread dies. Now
Stackless kills only tasklets with a nesting level > 0. During interpreter
shutdown stackless additionally kills daemon threads, if they execute Python code or
switch tasklets. This prevents access violations after clearing the thread states.

Add a 1ms sleep for each daemon thread. This way the thread gets a better
chance to run. Thanks to Kristján Valur Jónsson for suggesting this improvement.

Add a test case for a C assertion violation during thread shutdown.
Add some explanatory comments to the shutdown test case.
Thanks to Christian Tismer for the sugestion.

https://bitbucket.org/stackless-dev/stackless/issues/81
https://bitbucket.org/stackless-dev/stackless/issues/89
(grafted from 2cc41427a347a1ebec4eedc3db06a3664e67f798, 1388a2003957,
6140f5aaca2c and edc9b92ec457)
akruis pushed a commit to akruis/cpython that referenced this pull request Sep 9, 2017
@brettcannon

Copy link
Copy Markdown
Member

To try and help move older pull requests forward, we are going through and backfilling 'awaiting' labels on pull requests that are lacking the label. Based on the current reviews, the best we can tell in an automated fashion is that a core developer requested changes to be made to this pull request.

If/when the requested changes have been made, please leave a comment that says, I have made the requested changes; please review again. That will trigger a bot to flag this pull request as ready for a follow-up review.

@methane

methane commented Jul 8, 2018

Copy link
Copy Markdown
Member

I close this because I don't like APIs this PR has and the author seems satisfied by
putting his library on PyPI.

@methane methane closed this Jul 8, 2018
jaraco pushed a commit that referenced this pull request Dec 2, 2022
jaraco added a commit to jaraco/cpython that referenced this pull request Feb 17, 2023
@fgallaire fgallaire mannequin mentioned this pull request Apr 10, 2022
SonicField added a commit to SonicField/cpython that referenced this pull request Apr 23, 2026
Per supervisor [chat L2626] + L2742 + librarian L2737 (close pythia python#87
python#3 ephemera-gap durably).

Captures empirical W26 push 84 incident (95c9f9b1553c14 fix):
- Both cinder_opcode_ids.h + Include/opcode.h use Py_OPCODE_H guard
- Include order silently shadows whichever loads second
- W26 case: BINARY_OP_ADD_INT undefined → attr_probe HIR regression
  caught by W21 golden trip-wire

Three resolution options enumerated:
- (A) rename cinder_opcode_ids.h guard to CINDERX_OPCODE_IDS_H —
  recommended, eliminates collision class
- (B) static_assert in cinder_opcode.h to detect collision at compile
  — hardens detection without fixing
- (C) header-comment only — current state, NOT recommended (chat-
  ephemera proven insufficient)

Related W29 candidate noted: PHX_PRIM_OP_* / PHX_PRIM_UOP_* hard-coded
in builder_emit_c.c lines 3727-3746 with no static_assert binding to
classloader.h authoritative #defines (per pythia python#89 python#3 re-issue).

DEFERRED per supervisor — schedule post-Batch-2 burndown, before any
upstream sync touching Include/opcode.h or classloader.h.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants