-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch inserts #15
Comments
Batch inserts are multiple insert statements packed together in 1 DbCommand. Are you referring to that? Or are you referring to: |
What it allows you to do is that if you are going to insert 2 BlogPosts with 2 Comments each you can now do that in two sql statements. In EF7 the code below generates the sql that comes after
By using output you could pack for example 50 inserts into one statement and still get the ids properly populated. It returns tabular data instead of a scalar you get with SCOPE_IDENTITY(). |
ah! I didn't know that option was available in T-SQL! It can be used indeed if there's no relationship with self (so all rows inserted aren't referenced by any other row inserted), also not sure whether other DBs have similar mechanics available, but it's worth checking out. It might clash though with per-row features added later on (auditing/authorization/validation) but we'll see. :) |
Even self referencing relationsships should be solvable. If it's worh it is another question though. Building a dependency graph should solve it in the generic case. It's a bit of work but should be very doable. If we assume the most common use case you will only have 1-M self refering relationsships (a hierarchy) and they Add() the items as they "should"
This is easily divided by using a series of collections.
N-M relationsships: are not a problem as all the items are can be inserted at once and the mapping table get inserted after. 1-1 relationsships: Not even sure if L2S supports it. I don't think I used it. I think it can be done but if you insert both entities it would require inserting and then updating as far as I can see (should be same problem with single row). A more generic algorithm is needed if the user decides to insert like:
I have to think about if I can find a simple and fast algorithm for that. I have a few ideas but need to test/google a bit. |
L2S supports 1:1 relationships just fine, it doesn't support m:n relationships :) Sorting the dependencies isn't hard, it's done using a DAG and the algorithm 'Topological Sorting'. It's not doable to insert dependent rows within the same batch, you have to insert them separately (first PK side, then FK side with PK value obtained from PK side insert). One main pain point with batch inserts is that if a row fails (e.g. due to a UC violation or FK violation), the whole batch fails but you don't know which one failed. For me this has always been a showstopper, as I don't see how it can be beneficial to the user to have a persist fail with a generic 'something when wrong, dunno what, figure it out!' kind of error ;) |
One cool thing EF7 will do is to use OUTPUT in Sql Server to support batch inserts even with db generated ids. I have not investigated if any other engines support it or have similar.
Compared to SqlBulkCopy it's ofcourse slower, but the advantage is that it supports FK fixup automatically ang gives better error messages and might even be faster under a certain treshold. It already needs code to decide in which order a graph should be inserted so should be a quite easy implementation.
The text was updated successfully, but these errors were encountered: