Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better browsing #199

Merged
merged 12 commits into from
Feb 8, 2023
102 changes: 79 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,25 @@
UnROOT.jl is a reader for the [CERN ROOT](https://root.cern) file format
written entirely in Julia, without any dependence on ROOT or Python.

## Important API changes in v0.9.0

We decided to alter the behaviour of `getindex(f::ROOTfile, s::AbstractString)` which is essentially
the method called called when `f["foo/bar"]` is used. Before `v0.9.0`, `UnROOT` tried to do a best guess
and return a tree/branch or even fully parsed data. This lead to two bigger issues.

1. Errors prevented any further exploration once `UnROOT` bumped into something it could not interpret, although it might not even be requested by the user (e.g. the interpretation of a single branch in a tree, while others would work fine)
2. Unpredictable behaviour (type instability): the path dictates which type of data is returned.

Starting from `v0.9.0` we introduce an interface where `f["..."]` always returns genuine ROOT datatypes (or custom ones if you provide interpretations) and only perfroms the actual parsing when explicitly requested by the user via helper methods like `LazyBranch(f, "...")`.

Long story short, the following pattern can be used to fix your code when upgrading to `v0.9.0`:

f("foo/bar") => LazyBranch(f, "foo/bar")

The `f["foo/bar"]` accessor should now work on almost all files and is a handy utility to explore the ROOT data structures.

See [PR199](https://github.com/JuliaHEP/UnROOT.jl/pull/199) for more details.

## Installation Guide
1. Download the latest [Julia release](https://julialang.org/downloads/)
2. Open up Julia REPL (hit `]` once to enter Pkg mode, hit backspace to exit it)
Expand All @@ -27,24 +46,31 @@ julia> using UnROOT
julia> f = ROOTFile("test/samples/NanoAODv5_sample.root")
ROOTFile with 2 entries and 21 streamers.
test/samples/NanoAODv5_sample.root
└─ Events
├─ "run"
├─ "luminosityBlock"
├─ "event"
├─ "HTXS_Higgs_pt"
├─ "HTXS_Higgs_y"
└─ "⋮"
├─ Events (TTree)
│ ├─ "run"
│ ├─ "luminosityBlock"
│ ├─ "event"
│ ├─ "⋮"
│ ├─ "L1_UnpairedBunchBptxPlus"
│ ├─ "L1_ZeroBias"
│ └─ "L1_ZeroBias_copy"
└─ untagged (TObjString)


julia> mytree = LazyTree(f, "Events", ["Electron_dxy", "nMuon", r"Muon_(pt|eta)$"])
Row │ Electron_dxy nMuon Muon_eta Muon_pt
│ Vector{Float32} UInt32 Vector{Float32} Vector{Float32}
─────┼───────────────────────────────────────────────────────────
1 │ [0.000371] 0 [] []
2 │ [-0.00982] 2 [0.53, 0.229] [19.9, 15.3]
3 │ [] 0 [] []
4 │ [-0.00157] 0 [] []
⋮ │ ⋮ ⋮ ⋮ ⋮

Row │ Electron_dxy nMuon Muon_pt Muon_eta
│ SubArray{Float3 UInt32 SubArray{Float3 SubArray{Float3
─────┼────────────────────────────────────────────────────────────────────────────
1 │ [0.000371] 0 [] []
2 │ [-0.00982] 2 [19.9, 15.3] [0.53, 0.229]
3 │ [] 0 [] []
4 │ [-0.00157] 0 [] []
5 │ [] 0 [] []
6 │ [-0.00126] 0 [] []
7 │ [0.0612, 0.000642] 2 [22.2, 4.43] [-1.13, 1.98]
8 │ [0.00587, 0.000549, -0.00617] 0 [] []
⋮ │ ⋮ ⋮ ⋮ ⋮
992 rows omitted
```

### RNTuple
Expand All @@ -57,20 +83,30 @@ julia> using UnROOT
julia> f = ROOTFile("./test/samples/RNTuple/test_ntuple_stl_containers.root");

julia> f["ntuple"]
UnROOT.RNTuple:
header:
UnROOT.RNTuple with 5 rows, 13 fields, and metadata:
header:
name: "ntuple"
ntuple_description: ""
writer_identifier: "ROOT v6.29/01"
schema:
schema:
RNTupleSchema with 13 top fields
├─ :lorentz_vector ⇒ Struct
├─ :vector_tuple_int32_string ⇒ Vector
├─ :string ⇒ String
├─ :vector_string ⇒ Vector
...
..
.
├─ :vector_vector_int32 ⇒ Vector
├─ :vector_variant_int64_string ⇒ Vector
├─ :vector_vector_string ⇒ Vector
├─ :variant_int32_string ⇒ Union
├─ :array_float ⇒ StdArray{3}
├─ :tuple_int32_string ⇒ Struct
├─ :array_lv ⇒ StdArray{3}
├─ :pair_int32_string ⇒ Struct
└─ :vector_int32 ⇒ Vector

footer:
cluster_summaries: UnROOT.ClusterSummary[ClusterSummary(num_first_entry=0, num_entries=5)]

julia> LazyTree(f, "ntuple")
Row │ string vector_int32 array_float vector_vector_i vector_string vector_vector_s variant_int32_s vector_variant_ ⋯
│ String Vector{Int32} StaticArraysCor Vector{Vector{I Vector{String} Vector{Vector{S Union{Int32, St Vector{Union{In ⋯
Expand Down Expand Up @@ -109,11 +145,31 @@ XRootD is also supported, depending on the protocol:
- (1.6+ only) or the "url" has to start with `root://` and have another `//` to separate server and file path
```julia
julia> r = @time ROOTFile("https://scikit-hep.org/uproot3/examples/Zmumu.root")
0.034877 seconds (5.13 k allocations: 533.125 KiB)
3.284499 seconds (13.10 M allocations: 670.450 MiB, 4.62% gc time, 93.34% compilation time)
ROOTFile with 1 entry and 18 streamers.
https://scikit-hep.org/uproot3/examples/Zmumu.root
└─ events (TTree)
├─ "Type"
├─ "Run"
├─ "Event"
├─ "⋮"
├─ "phi2"
├─ "Q2"
└─ "M"

julia> r = ROOTFile("root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/Run2012B_DoubleMuParked.root")

ROOTFile with 1 entry and 19 streamers.
root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/Run2012B_DoubleMuParked.root
└─ Events (TTree)
├─ "run"
├─ "luminosityBlock"
├─ "event"
├─ "⋮"
├─ "Electron_dxyErr"
├─ "Electron_dz"
└─ "Electron_dzErr"

```

## TBranch of custom struct
Expand Down
9 changes: 9 additions & 0 deletions src/displays.jl
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,15 @@ function children(t::TTree)
return ks
end
end
children(t::Union{TTree, TBranchElement}) = t.fBranches
function Base.show(io::IO, ::MIME"text/plain", b::Union{TTree, TBranchElement})
if isempty(b.fBranches)
print(io, b)
else
print_tree(io, b)
end
end
printnode(io::IO, t::TBranchElement) = print(io, "$(t.fName)")
printnode(io::IO, t::TTree) = print(io, "$(t.fName) (TTree)")
printnode(io::IO, f::ROOTFile) = print(io, f.filename)
printnode(io::IO, f::ROOTDirectory) = print(io, "$(f.name) (TDirectory)")
Expand Down
3 changes: 2 additions & 1 deletion src/iteration.jl
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,7 @@ mutable struct LazyBranch{T,J,B} <: AbstractVector{T}
[0:-1 for _ in 1:Threads.nthreads()])
end
end
LazyBranch(f::ROOTFile, s::AbstractString) = LazyBranch(f, _getindex(f, s))
basketarray(lb::LazyBranch, ithbasket) = basketarray(lb.f, lb.b, ithbasket)
basketarray_iter(lb::LazyBranch) = basketarray_iter(lb.f, lb.b)

Expand Down Expand Up @@ -404,7 +405,7 @@ function LazyTree(f::ROOTFile, tree::TTree, s, branches)
replace!(tail, "fCoordinates" => "")
norm_name = join([head; join(tail)], "_")
end
d[Symbol(norm_name)] = f["$s/$b"]
d[Symbol(norm_name)] = LazyBranch(f, "$s/$b")
end
return LazyTree(NamedTuple{Tuple(keys(d))}(values(d)))
end
Expand Down
12 changes: 2 additions & 10 deletions src/root.jl
Original file line number Diff line number Diff line change
Expand Up @@ -135,15 +135,7 @@ end


function Base.getindex(f::ROOTFile, s::AbstractString)
Moelf marked this conversation as resolved.
Show resolved Hide resolved
S = _getindex(f, s)
if S isa Union{TBranch, TBranchElement}
# try # if we can't construct LazyBranch, just give up (maybe due to custom class)
return LazyBranch(f, S)
# catch
# @warn "Can't automatically create LazyBranch for branch $s. Returning a branch object"
# end
end
S
_getindex(f, s)
end

@memoize LRU(maxsize = 2000) function _getindex(f::ROOTFile, s)
Expand Down Expand Up @@ -402,7 +394,6 @@ function auto_T_JaggT(f::ROOTFile, branch; customstructs::Dict{String, Type})
else
leaftype = _normalize_ftype(leaf.fType)
_type = get(_leaftypeconstlookup, leaftype, nothing)
isnothing(_type) && error("Cannot interpret type.")
if branch.fType == Const.kSubbranchSTLCollection
_type = Vector{_type}
end
Expand Down Expand Up @@ -486,6 +477,7 @@ function readbasketseek(f::ROOTFile, branch::Union{TBranch, TBranchElement}, see
basketkey = unpack(rawbuffer, TBasketKey)
compressedbytes = compressed_datastream(rawbuffer, basketkey)

@debug "Seek position: $seek_pos"
Moelf marked this conversation as resolved.
Show resolved Hide resolved
basketrawbytes = decompress_datastreambytes(compressedbytes, basketkey)

@debug begin
Expand Down
4 changes: 4 additions & 0 deletions src/streamers.jl
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ struct TStreamerInfo{T}
end

function unpack(io, tkey::TKey, refs::Dict{Int32, Any}, T::Type{TStreamerInfo})
@debug "Unpacking: $(tkey)"
preamble = Preamble(io, T)
fName, fTitle = nametitle(io)
fCheckSum = readtype(io, UInt32)
Expand Down Expand Up @@ -305,8 +306,11 @@ struct TObjArray
elements
end
Base.getindex(obj::TObjArray, index) = obj.elements[index]
Base.length(a::TObjArray) = length(a.elements)
Base.iterate(a::TObjArray, state=1) = state > length(a) ? nothing : (a.elements[state], state+1)

function unpack(io, tkey::TKey, refs::Dict{Int32, Any}, T::Type{TObjArray})
@debug "Unpacking: $(tkey)"
preamble = Preamble(io, T)
skiptobj(io)
name = readtype(io, String)
Expand Down
3 changes: 2 additions & 1 deletion src/types.jl
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,8 @@ function decompress_datastreambytes(compbytes, tkey)
compression_header = unpack(io, CompressionHeader)
cname, _, compbytes, uncompbytes = unpack(compression_header)
rawbytes = read(io, compbytes)
@debug cname
@debug "Compression type: $(cname)"
@debug "Compressed/uncompressed size in bytes: $(compbytes) / $(uncompbytes)"

if cname == "L4"
# skip checksum which is 8 bytes
Expand Down
44 changes: 22 additions & 22 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -181,13 +181,13 @@ end
close(rootfile)

rootfile = ROOTFile(joinpath(SAMPLES_DIR, "tree_with_large_array_lz4.root"))
arr = collect(rootfile["t1/float_array"])
arr = collect(LazyBranch(rootfile, rootfile["t1/float_array"]))
@test 100000 == length(arr)
@test [0.0, 1.0588236, 2.1176472, 3.1764705, 4.2352943] ≈ arr[1:5] atol=1e-7
close(rootfile)

rootfile = ROOTFile(joinpath(SAMPLES_DIR, "tree_with_int_array_zstd.root"))
arr = collect(rootfile["t1/a"])
arr = collect(LazyBranch(rootfile, "t1/a"))
@test arr == 0:99
close(rootfile)
end
Expand Down Expand Up @@ -249,7 +249,7 @@ end
@testset "TLorentzVector" begin
# 64bits T
rootfile = ROOTFile(joinpath(SAMPLES_DIR, "TLorentzVector.root"))
branch = rootfile["t1/LV"]
branch = LazyBranch(rootfile, "t1/LV")
tree = LazyTree(rootfile, "t1")

@test branch[1].x == 1.0
Expand All @@ -262,7 +262,7 @@ end

# jagged LVs
rootfile = ROOTFile(joinpath(SAMPLES_DIR, "Jagged_TLorentzVector.root"))
branch = rootfile["t1/LVs"]
branch = LazyBranch(rootfile, "t1/LVs")
tree = LazyTree(rootfile, "t1")

@test eltype(branch) <: AbstractVector{LorentzVectors.LorentzVector{Float64}}
Expand All @@ -273,7 +273,7 @@ end

@testset "TNtuple" begin
rootfile = ROOTFile(joinpath(SAMPLES_DIR, "TNtuple.root"))
arrs = [collect(rootfile["n1/$c"]) for c in "xyz"]
arrs = [collect(LazyBranch(rootfile, "n1/$c")) for c in "xyz"]
@test length.(arrs) == fill(100, 3)
@test arrs[1] ≈ 0:99
@test arrs[2] ≈ arrs[1] .+ arrs[1] ./ 13
Expand All @@ -284,7 +284,7 @@ end
@testset "Singly jagged branches" begin
# 32bits T
rootfile = ROOTFile(joinpath(SAMPLES_DIR, "tree_with_jagged_array.root"))
data = rootfile["t1/int32_array"]
data = LazyBranch(rootfile, "t1/int32_array")
@test data[1] == Int32[]
@test data[1:2] == [Int32[], Int32[0]]
@test data[end] == Int32[90, 91, 92, 93, 94, 95, 96, 97, 98]
Expand All @@ -293,7 +293,7 @@ end
# 64bits T
T = Float64
rootfile = ROOTFile(joinpath(SAMPLES_DIR, "tree_with_jagged_array_double.root"))
data = rootfile["t1/double_array"]
data = LazyBranch(rootfile, "t1/double_array")
@test data isa AbstractVector
@test eltype(data) <: AbstractVector{T}
@test data[1] == T[]
Expand Down Expand Up @@ -326,11 +326,11 @@ end
vvi = [[[2], [3, 5]], [[7, 9, 11], [13]], [[17], [19], []], [], [[]]]
vvf = [[[2.5], [3.5, 5.5]], [[7.5, 9.5, 11.5], [13.5]], [[17.5], [19.5], []], [], [[]]]
@test UnROOT.array(rootfile, "t1/bi") == vvi
@test rootfile["t1/bi"] == vvi
@test eltype(eltype(eltype(rootfile["t1/bi"]))) === Int32
@test LazyBranch(rootfile, "t1/bi") == vvi
@test eltype(eltype(eltype(LazyBranch(rootfile, "t1/bi")))) === Int32
@test UnROOT.array(rootfile, "t1/bf") == vvf
@test rootfile["t1/bf"] == vvf
@test eltype(eltype(eltype(rootfile["t1/bf"]))) === Float32
@test LazyBranch(rootfile, "t1/bf") == vvf
@test eltype(eltype(eltype(LazyBranch(rootfile, "t1/bf")))) === Float32
close(rootfile)
end

Expand Down Expand Up @@ -365,7 +365,7 @@ end
rootfile = ROOTFile(joinpath(SAMPLES_DIR, "NanoAODv5_sample.root"))
event = UnROOT.array(rootfile, "Events/event")
@test event[1:3] == UInt64[12423832, 12423821, 12423834]
Electron_dxy = rootfile["Events/Electron_dxy"]
Electron_dxy = LazyBranch(rootfile, "Events/Electron_dxy")
@test eltype(Electron_dxy) == SubArray{Float32, 1, Vector{Float32}, Tuple{UnitRange{Int64}}, true}
@test Electron_dxy[1:3] ≈ [Float32[0.0003705], Float32[-0.00981903], Float32[]]
HLT_Mu3_PFJet40 = UnROOT.array(rootfile, "Events/HLT_Mu3_PFJet40")
Expand All @@ -377,7 +377,7 @@ end
tree = LazyTree(rootfile, "Events", r"Muon_(pt|eta)$")
@test sort(propertynames(tree) |> collect) == sort([:Muon_pt, :Muon_eta])
@test occursin("LazyEvent", repr(first(iterate(tree))))
@test sum(rootfile["Events/HLT_Mu3_PFJet40"]) == 443
@test sum(LazyBranch(rootfile, "Events/HLT_Mu3_PFJet40")) == 443
close(rootfile)
end

Expand Down Expand Up @@ -467,9 +467,9 @@ end
"KM3NETDAQ::JDAQEvent.KM3NETDAQ::JDAQEventHeader" => UnROOT._KM3NETDAQEventHeader
)
f_auto = UnROOT.ROOTFile(joinpath(SAMPLES_DIR, "km3net_online.root"), customstructs=customstructs)
headers_auto = f_auto["KM3NET_EVENT/KM3NET_EVENT/KM3NETDAQ::JDAQEventHeader"]
event_hits_auto = f_auto["KM3NET_EVENT/KM3NET_EVENT/snapshotHits"]
event_thits_auto = f_auto["KM3NET_EVENT/KM3NET_EVENT/triggeredHits"]
headers_auto = LazyBranch(f_auto, "KM3NET_EVENT/KM3NET_EVENT/KM3NETDAQ::JDAQEventHeader")
event_hits_auto = LazyBranch(f_auto, "KM3NET_EVENT/KM3NET_EVENT/snapshotHits")
event_thits_auto = LazyBranch(f_auto, "KM3NET_EVENT/KM3NET_EVENT/triggeredHits")

for event_hits ∈ [event_hits_manual, event_hits_auto]
@test length(event_hits) == 3
Expand Down Expand Up @@ -592,7 +592,7 @@ end
rootfile = UnROOT.samplefile("cms_ntuple_wjet.root")
pts1 = UnROOT.array(rootfile, "variable/met_p4/fCoordinates/fCoordinates.fPt"; raw=false)
pts2 = LazyTree(rootfile, "variable", [r"met_p4/fCoordinates/.*", "mll"])[!, Symbol("met_p4_fPt")]
pts3 = rootfile["variable/good_jets_p4/good_jets_p4.fCoordinates.fPt"]
pts3 = LazyBranch(rootfile, "variable/good_jets_p4/good_jets_p4.fCoordinates.fPt")
@test 24 == length(pts1)
@test Float32[69.96958, 25.149912, 131.66693, 150.56802] == pts1[1:4]
@test pts1 == pts2
Expand All @@ -601,7 +601,7 @@ end

# issue 61
rootfile = UnROOT.samplefile("issue61.root")
@test rootfile["Events/Jet_pt"][:] == Vector{Float32}[[], [27.324587, 24.889547, 20.853024], [], [20.33066], [], []]
@test LazyBranch(rootfile, "Events/Jet_pt")[:] == Vector{Float32}[[], [27.324587, 24.889547, 20.853024], [], [20.33066], [], []]
close(rootfile)

# issue 78
Expand All @@ -615,8 +615,8 @@ end
# unsigned short -> Int16, ulong64 -> UInt64
# file minified with `rooteventselector --recreate -l 2 "trackntuple.root:trackingNtuple/tree" issue108_small.root`
rootfile = ROOTFile(joinpath(SAMPLES_DIR, "issue108_small.root"))
@test rootfile["tree/trk_algoMask"][2] == [0x0000000000004000, 0x0000000000004000, 0x0000000000004000, 0x0000000000004000]
@test rootfile["tree/pix_ladder"][3][1:5] == UInt16[0x0001, 0x0001, 0x0001, 0x0001, 0x0003]
@test LazyBranch(rootfile, "tree/trk_algoMask")[2] == [0x0000000000004000, 0x0000000000004000, 0x0000000000004000, 0x0000000000004000]
@test LazyBranch(rootfile, "tree/pix_ladder")[3][1:5] == UInt16[0x0001, 0x0001, 0x0001, 0x0001, 0x0003]
close(rootfile)

# issue 116
Expand Down Expand Up @@ -747,10 +747,10 @@ end
@test sort(keys(f["mydir"])) == ["Events", "c", "d", "mysubdir"]
@test sort(keys(f["mydir/mysubdir"])) == ["e", "f"]
@test sum(length.(LazyTree(f, "mydir/Events").Jet_pt)) == 4
@test sum(length.(f["mydir/Events/Jet_pt"])) == 4
@test sum(length.(LazyBranch(f, "mydir/Events/Jet_pt"))) == 4

f = UnROOT.samplefile("issue11_tdirectory.root")
@test sum(f["Data/mytree/Particle0_E"]) ≈ 1012.0
@test sum(LazyBranch(f, "Data/mytree/Particle0_E")) ≈ 1012.0
end

@testset "Basic C++ types" begin
Expand Down