◐ Shell
clean mode source ↗

gh-87092: compiler's CFG construction moved to after codegen stage by iritkatriel · Pull Request #102320 · python/cpython

…r codegen and cfg_builder

@bedevere-bot

🤖 New build scheduled with the buildbot fleet by @iritkatriel for commit b8f4acf 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot

🤖 New build scheduled with the buildbot fleet by @iritkatriel for commit e5653ee 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot

🤖 New build scheduled with the buildbot fleet by @iritkatriel for commit 788da74 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@iritkatriel

The test failures are a couple of timeouts plus a couple of errors that are also showing up on other branches.

@markshannon

Do we really want another instruction struct and another instruction sequence struct?
A basic block, extended block and general instruction sequence, are much the same. The main differences being where jumps are allowed and how they are resolved.
The allowed use and meaning of branches should be clear from the context.
If you want stronger typing in C, just wrap the base instruction sequence:

typedef struct InstructionSequence InstructionSequence;
typedef struct {
    InstructionSequence instructions;
} InstructionList;

typedef struct {
    InstructionSequence instructions;
} BasicBlock;

The reason I'd like a single InstructionSequence, is that we can stick an object header in front of it and access it from Python.
Having instruction sequences be Python objects would make introspection and debugging so much nicer.

@iritkatriel

We can have an InstructionSequence type which is reused in basic blocks, but basic blocks have a lot more fields that we don't need for codegen, and they don't need a label map because they should have pointers to the target blocks.

@markshannon

We can have an InstructionSequence type which is reused in basic blocks

That makes sense.

@iritkatriel

We can have an InstructionSequence type which is reused in basic blocks

That makes sense.

I'm not sure.

The instruction in a basicblock is

struct cfg_instr {
    int i_opcode;
    int i_oparg;
    location i_loc;
    /* The following fields should not be set by the front-end: */
    struct basicblock_ *i_target; /* target block (if jump instruction) */
    struct basicblock_ *i_except; /* target block when exception is raised */
};

We can replace the first 3 fields by an instruction struct (this makes sense, I only avoided it to keep the diff smaller). But we would still need the target pointers, or to create a parallel data structure in the basic block for the pointers (and keep it in sync with the instruction sequence when we insert or delete instructions inside a block). It would be simpler to write a function that translates a CFG to something that can be exposed to python.

@iritkatriel

In any case, I don't think we should refactor the CFG data structures in the same PR. It will be another huge change.

@iritkatriel

I've created a function for resizing the doubling arrays, so this logic is not repeated 3 times now.

To share the instruction type between codegen and cfg, we could do:

struct cfg_instr {
    instruction i_instr;
    struct basicblock_ *i_target; /* target block (if jump instruction) */
    struct basicblock_ *i_except; /* target block when exception is raised */
};

instead of the current:

struct cfg_instr {
     int i_opcode;
     int i_oparg;
     location i_loc;
    struct basicblock_ *i_target; /* target block (if jump instruction) */
    struct basicblock_ *i_except; /* target block when exception is raised */
};

Do you think we should do it in this PR? It would make the diff quit a lot messier.

markshannon

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of questions, otherwise looks good.

@iritkatriel

A couple of questions, otherwise looks good.

Thank you. I'll merge then.

carljm added a commit to carljm/cpython that referenced this pull request

Mar 7, 2023

carljm added a commit to carljm/cpython that referenced this pull request

Mar 8, 2023