-
-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issue and optimization ideas - TMVCJsonDataObjectsSerializer.JsonObjectToDataSet #553
Comments
Yes, unfortunately this method is called for each item (n) in the json array (from procedure TMVCJsonDataObjectsSerializer.JsonArrayToDataSet) And the method you referenced calls those RTTI methods for each field (n) so this becomes an O(nm) performance calculations (8,000 rows x 38). My initial thoughts would be to either build a RTTI structure (via a call to a nested proc) and then pass that structure to the method that converts each element of the array to a dataset row. |
Yes guys, that part of code has been under my radar for too much time... I've to give it some attention. |
I have made a simple example (console application) with latest DMVC units and Delphi 11. |
@MPannier the repo version contains many improvemenets over the 3.2.1-carbon so sound good that its faster than the latest released version. We can however still use the test you wrote to fine tuning it and, perhaps, put this test in the test suite as performance test. |
Happy to help with testing |
Hi guys, do you found some areas to improve with a reasonable gain? The current version is quite fast in the @MPannier test. |
I will do a new test with my real world project this week and give You feedback. |
I wrote a simple VCL app to create and populate a 3 column dataset with 20k records, serialise them out to a JSON string and then read them back. The deserialisation with the current code base is no faster than the codebase we have in our production code (from Sept 2021). We're seeing around 10 seconds to deserialise the JSON, the root of the issue is the repeated lookup of the dataset field using RTTI. 20k x 3 is 60k lookups where 3 would suffice. |
To improve performance you'd need to providing an overload to And do the setup of the TMVCDataSetFields before calling this new method once for each record. |
Hello, I have done the tests with the latest version of DMVC Framework from yesterday in my real project. And it was still slow. It needs 15000 milliseconds. My Dataset contains about 7800 records and has 41 columns.
But why is the test project much faster than my real world project? I have debugged the project and find out, that within the test project the function TMVCAbstractSerializer.GetNameAs gets nil at the line ObjFld := ObjType.GetField(AComponentName); As @fastbike wrote and as I have thought, the problem is the repeated RTTI call for each column and for each record. And there are 3 procedures which calls RTTI. In my case 7800 * 41 * 3. How can we solve this performance problem? In attachment there is the new test project. |
I've made a small change to use the TMVCDataSetFields list, which is populated by calling GetDataSetFields in the TMVCJsonDataObjectsSerializer.JsonArrayToDataSet method , before enumerating the json array. Time has now fallen from ~ 10 seconds to ~115 milliseconds. Two orders of magnitude faster which is a massive win ! The only gotcha at this stage is that I commented out the lines handling nested datasets to get the code to compile. Either we take a hit on having to make a separate calls to GetDataSetFields for each record that is a nested dataset or figure out another way of handling what could possibly be a recursive nested dataset scenario.
and
|
I'd also suggest renaming the record property - for clarity - from "I" to "Index" if that does not break too many things |
I can confirm. The change from @fastbike works much faster. |
Sounds Good! We need to find a way to get this improvements mantaining the same functionality level. I'll give a look ASAP. Thanks guys. |
Hi guys. I did some changes based on your suggestion. That caused some refactoring about methods visibility, paramaters, types and so on. However, it should be OK. All the unit tests (currently 429) passed OK. Please, check it on your side, and thanks for your support. |
Hello,
I have a JSON array with 7500 elements which I load into a TFDMemTable with 38 fields by using this function FDMemTable1.LoadFromJSONArrayString.
In my case, this takes about 14 sec. I did some research to speed up the data load. I found some optimizations for FDMemTable (BeginBatch/EndBatch and AutoCalcFields := False;).
Now the "load" takes about 12 sec. It is faster but not fast enough.
I have done a test with a classic for i := 1 to 8000 loop to insert some hard coded fake data into my FDMemTable. I would like to know which part is the bottleneck. This takes less than 200 ms!
That means my FDMemTable and also the 8000 records are not the problem.
After that, I have tested the JSON parsing. I've done something like this:
lJsonArray := TJDOJsonArray.Parse(Res.BodyAsString) as TJDOJsonArray;
try
for var J := 0 to Pred(lJsonArray.Count) do
begin
var lJsonObj : TJsonObject := lJsonArray.Items[J].ObjectValue;
DataSet.Append;
DataSetField1.AsInteger := lJsonObj.I['Field1'];
DataSetField2.AsWideString := lJsonObj.S['Field2'];
...
DataSet.Post;
end;
This takes only 200 ms (with all 38 fields). Compared to 12 sec, it is really fast.
Why is LoadFromJSONArrayString slower?
I found the reason in TMVCJsonDataObjectsSerializer.JsonObjectToDataSet. For a new test, I have uncommented the following line:
//lName := GetNameAs(ADataSet.Owner, Field.Name, Field.FieldName);
//if (IsIgnoredAttribute(AIgnoredFields, lName)) or (IsIgnoredComponent(ADataSet.Owner, Field.Name)) then
//continue;
//lName := TMVCSerializerHelper.ApplyNameCase(GetNameCase(ADataSet, ANameCase), lName { Field.FieldName } );
lName := Field.FieldName; //uncomment the previous lines and use only this one
Now the data load takes only 200 ms.
The problem-functions are GetNameAs, IsIgnoredComponent and GetNameCase. All of these functions use RTTI to get some information from TField.
I know we can't remove this functions, but can we speed up these functions?
I have some ideas, but I'm not sure what the best one is.
First a question: Can TField have the attributes MVCNameAsAttribute, MVCDoNotSerializeAttribute and MVCNameCaseAttribute? Maybe we can remove one of these functions?
Is it possible to check for all attributes at the same time? Then the RTTI stuff is done only one time, not three times.
Is caching an option? (field name and a list of attributes)
Is an additional parameter an option? Something like "IgnoreNameCase" or "FieldNameAsIs"?
I have no experience with RTTI. Maybe the RTTI code can be optimized?
What do You think? Any other suggestions?
The text was updated successfully, but these errors were encountered: