-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement serde #2
Comments
Point 1: mimicking Fortran behavior We don't need to 100% match Fortran behavior as far as what it does when there's too many/few format specs for extra/not enough input data, but we should match as much as possible so that anyone translating Fortran code into Rust can use this without too much effort. Here's some F90 code I wrote for a quick test: program read_tests
implicit none
integer*4 x, y, z
real*4 a, b, c
character(16) name, short_input
character(72) input
! First try having a longer format spec than string
input = '12 34 56'
read(input, '(10i3)') x, y, z
write(*,*) 'Case 1: ', 'x=',x,'y=',y,'z=',z
! Then a shorter format spec
input = '100 200 300 400'
read(input, '(2i4)') x, y
write(*,*) 'Case 2: ', 'x=',x,'y=',y
! Now try having the string be too short
write(short_input, '(2f8.2)') 10.0, 20.0
write(*,*) 'short_input=', short_input
read(short_input, '(3f8.2)') a, b, c
write(*,*) 'Case 3: ', 'a=',a,'b=',b,'c=',c
end program The output is:
So what does this mean?
The first two behaviors make sense to reproduce. The third one, to me, does not. Even if the third behavior is, in fact, used by Fortran programs, I think it runs counter to what we expect in Rust. The only use case I could see is if the "extra" variables are optional, in which case it should be represented in Rust as an |
Point 2: how to handle Vecs and maps? These data structures are particularly tricky because Fortran usually uses fixed-length array and, more relevantly, when using a Most common formats
I'm not sure how useful the first one would be, given that the max length has to be a compile-time const. The second one I could still see being useful, though complicated to implement. For now, I think that the most important goal is to try to make deserialization do the obvious expected thing. If there are cases where deserialization can't access enough run time information to correctly partition a series of values into different sequences, then those are best handled by using an intermediate type with
|
Point 3: handling columnar files These are files like we have in GGG with a table of data using a known format string but with column headers. We can probably assume that the file as a whole would deserialize into one of:
These are probably going to require a special deserializer that stores the column names and iterates through them in sync with the format specs. The tricky part is whether I'll be able to handle the line-directed input correctly, or if the stateless nature of deserializers will pose a problem. For the fn vec_from_table<R: BufRead, D: Deserialize>(reader: R, header: &[&str]) -> Vec<D> where this would read one line at a time from the file and call The other types are more difficult; it's not clear at the moment how we would distinguish between the top level struct, map, or dataframe (which should use the header provided) and the inner types (which should not). |
Point 4: handling alternate struct deserialization I have this written so that the default way of deserializing structs is to treat them like a tuple and just deserialize their fields in the order they appear in the struct. However, we may want to support alternate ways of deserializing that don't rely on order. Specifically:
This isn't hard to switch for a full deserialization, but what if you had something like: struct Outer {
site_id: String,
met: Inner
}
struct Inner {
pres: f32,
temp: f32,
rhum: f32
} and this needed to be deserialized from the format
such that the impl TryFrom<(String, HashMap<String, f32>)> for Outer { ... }
#[derive(Deserialize)]
#[serde(try_from = "(String, HashMap<String, f32>)")]
struct Outer {
site_id: String,
met: Inner
} |
Serialization is done for the basic formatting field types. Deserialization has a few types remaining (none, newtypes, enums) but since I've worked out how I want to serialize them, the way deserialization should work is clearer. Once those types are implemented, I will close this. I may still implement an alternative method of handling structures/maps, where the field names are written as fields in the output. I've not decided whether this should be a option in the settings or a separate serializer - it will depend how much more complicated it makes the logic. |
This would be extremely convenient to be able to use serde to (de)serialize things into Fortran-style records from/to appropriate structs.
The text was updated successfully, but these errors were encountered: