filter_$(otus/samples)_from_otu_table.py do not work, returned error: object dtype dtype('O') has no native HDF5 equivalent #2205

apascualgarcia · 2017-10-26T17:20:07Z

I'm trying to use filter_otus_from_otu_table.py and filter_samples_from_otu_table.py with no success. The three files needed to reproduce the issues are here: test2git.zip.

If I start trying to filter with a file containing just one observation (contained in prueba.txt) it works:

$ filter_otus_from_otu_table.py -i otu.2test.metagenomes.biom -o otu.metagenomes.prueba.biom -e prueba.txt --negate_ids_to_exclude

But if want to get two observations (file prueba2.txt):

$ filter_otus_from_otu_table.py -i otu.2test.metagenomes.biom -o otu.metagenomes.prueba.biom -e prueba2.txt --negate_ids_to_exclude

It doesn't work, and it returns: TypeError: Object dtype dtype('O') has no native HDF5 equivalent

The same happens if I use again the list with one observation (first example) but I do not include the option --negate_ids_to_exclude, so it has problems when multiple observations/samples should be filtered but not with one. The error is also reproduced if I use directly biom:

$ biom subset-table -i otu.2test.metagenomes.biom -a observation -s prueba2.txt -o otu.2test.metagenomes.prueba.biom

Following this issue in biom-format (#513), it suggests that it may be a problem with the metadata. If try to convert to json:

$ biom convert -i otu.2test.metagenomes.biom -o otu.2test.metagenomes.json.biom --table-type="OTU table" --to-json

I get this error TypeError: array([u'["cathepsin L [EC:3.4.22.15]"]'], dtype=object) is not JSON serializable. And if I try to convert it to hdf5 with the suggested option --collapsed-samples:

$ biom convert -i otu.2test.metagenomes.biom -o otu.2test.metagenomes.hdf5.biom --table-type="OTU table" --to-hdf5 --collapsed-samples

I get TypeError: Object dtype dtype('O') has no native HDF5 equivalent. Please note that I controlled that the solutions to this bug (#759) were incorporated in my code. If it helps, I found a similar issue in the project CellProfiler (#995)

The text was updated successfully, but these errors were encountered:

jairideout · 2017-10-26T17:26:38Z

Can you please post your question on the QIIME 1 forum? That's where we provide user support for QIIME 1.

apascualgarcia · 2017-10-26T17:44:14Z

Sure, although it may be an issue with biom-format (not specific of qiime1 only?)

jairideout · 2017-10-26T18:03:36Z

The QIIME 1 Forum is likely your best bet because you have a mixture of QIIME 1 and biom-format commands, but you could instead try the biom-format issue tracker. Please don't post in both locations, many of the same developers monitor both. Either way, we don't provide user support for QIIME 1 or biom-format on this issue tracker. Thanks!

apascualgarcia · 2017-10-26T21:52:06Z

Sorry that I still answer here but I think it would be useful to post the following as it clarifies the problem, just in case someone find it here.

I've been able to perform the filtering making some collage of the code is used in picrust to deal with these matrices. It confirms that the problem comes from the metadata:

import picrust
import h5py
import json
import numpy as np
from biom import load_table
from biom.table import Table
from picrust.util import write_biom_table,picrust_formatter
from biom.util import HAVE_H5PY

table = load_table('otu.2test.metagenomes.biom')
# code found categorize_by_function.py
# metadata are not deserializing correctly. Duct tape it.
update_d = {}
for i, md in zip(table.ids(axis='observation'),
                 table.metadata(axis='observation')):
    update_d[i] = {k: json.loads(v[0]) for k, v in md.items()}
    table.add_metadata(update_d, axis='observation')
    
target = open("prueba2.txt","r")
genes = [row.strip() for row in target]
table_red=table.filter(genes,axis='observation',inplace=False)

#output in BIOM format found in predict_metagenomes.py
format_fs = {'KEGG_Description': picrust_formatter,
                     'COG_Description': picrust_formatter,
                     'KEGG_Pathways': picrust_formatter,
                     'COG_Category': picrust_formatter
                     }
write_biom_table(table_red,'table.test.biom',format_fs=format_fs) # hdf5
#write_biom_table(table_red,'table.test.biom',write_hdf5=False,format_fs=format_fs) # Json

jairideout closed this as completed Oct 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

filter_$(otus/samples)_from_otu_table.py do not work, returned error: object dtype dtype('O') has no native HDF5 equivalent #2205

filter_$(otus/samples)_from_otu_table.py do not work, returned error: object dtype dtype('O') has no native HDF5 equivalent #2205

apascualgarcia commented Oct 26, 2017

jairideout commented Oct 26, 2017

apascualgarcia commented Oct 26, 2017

jairideout commented Oct 26, 2017

apascualgarcia commented Oct 26, 2017 •

edited

Loading

filter_$(otus/samples)_from_otu_table.py do not work, returned error: object dtype dtype('O') has no native HDF5 equivalent #2205

filter_$(otus/samples)_from_otu_table.py do not work, returned error: object dtype dtype('O') has no native HDF5 equivalent #2205

Comments

apascualgarcia commented Oct 26, 2017

jairideout commented Oct 26, 2017

apascualgarcia commented Oct 26, 2017

jairideout commented Oct 26, 2017

apascualgarcia commented Oct 26, 2017 • edited Loading

apascualgarcia commented Oct 26, 2017 •

edited

Loading