Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate listDatasetFileDetails DBSClient API #5

Closed
vkuznet opened this issue Feb 9, 2022 · 1 comment
Closed

Investigate listDatasetFileDetails DBSClient API #5

vkuznet opened this issue Feb 9, 2022 · 1 comment

Comments

@vkuznet
Copy link
Contributor

vkuznet commented Feb 9, 2022

During testing of WMCore DBS3Reader I found that both Py/Go server takes significant amount of time on the following DBS3Reader API:

     dataset="/VBF1Parked/Run2012D-v1/RAW"
     res = reader.listDatasetFileDetails(dataset)

This API call by itself has several for (nested loops) and received data from

       fileDetails = self.getFileListByDataset(dataset=datasetPath, validFileOnly=validFileOnly, det    ail=True
        blocks = set()  # the set of blocks of the dataset
        # Iterate over the files and prepare the set of blocks and a dict where the keys are the files
        files = {}
        for f in fileDetails:
            blocks.add(f['block_name'])
            files[f['logical_file_name']] = remapDBS3Keys(f, stringify=True)
            files[f['logical_file_name']]['ValidFile'] = f['is_file_valid']
            files[f['logical_file_name']]['Lumis'] = {}
            files[f['logical_file_name']]['Parents'] = []
        # Iterate over the blocks and get parents and lumis
        for blockName in blocks:
            # get the parents
            if getParents:
                parents = self.dbs.listFileParents(block_name=blockName)
                for p in parents:
                    if p['logical_file_name'] in files:  # invalid files are not there if validFileOnly=1
                        files[p['logical_file_name']]['Parents'].extend(p['parent_logical_file_name'])

            if getLumis:
                # get the lumis
                file_lumis = self.dbs.listFileLumis(block_name=blockName)
                for f in file_lumis:
                    if f['logical_file_name'] in files:  # invalid files are not there if validFileOnly=1
                        if f['run_num'] in files[f['logical_file_name']]['Lumis']:
                            files[f['logical_file_name']]['Lumis'][f['run_num']].extend(f['lumi_section_num'])
                        else:
                            files[f['logical_file_name']]['Lumis'][f['run_num']] = f['lumi_section_num']

        return files

I suggest to provide individual APIs to better understand timing spend in them and within internal (nested) for loops of DBS3Reader API call.

@vkuznet
Copy link
Contributor Author

vkuznet commented Jul 22, 2022

this is done now, closing the ticket.

@vkuznet vkuznet closed this as completed Jul 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant