Discussion: Pushing batches of data to online store: Should `conn.commit()` happen in the for loop or after?
This piece of code in the online_write_batch function in the Postgres online store pushes data in batches to the online store.
I was wondering whether it makes more sense to put the conn.commit() inside the for loop, or after the for loop. I would love to hear the different trade-offs between the two options!
Copy of the code snippet here:
batch_size = 5000 for i in range(0, len(insert_values), batch_size): cur_batch = insert_values[i : i + batch_size] execute_values( cur, sql.SQL( """ INSERT INTO {} (entity_key, feature_name, value, event_ts, created_ts) VALUES %s ON CONFLICT (entity_key, feature_name) DO UPDATE SET value = EXCLUDED.value, event_ts = EXCLUDED.event_ts, created_ts = EXCLUDED.created_ts; """, ).format(sql.Identifier(_table_id(project, table))), cur_batch, page_size=batch_size, ) conn.commit() # << This is the point of interest. Should this be placed in the loop or after?