Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancy between API results and downloaded table from Census data.gov #44

Open
mdiep-cese opened this issue Apr 12, 2022 · 5 comments

Comments

@mdiep-cese
Copy link

I am noticing that there is a discrepancy with the data that is obtained when using this API than the data that is obtained directly from the Census data website (via their table download feature).

The datafield that I am looking at specifically is: B18108 (Disability-related data), but from the look of it, this affects other datafield as well. And I am using county-level data for ASC-1-year 2019.

The following code is used:

result = censusdata.download('acs1', 2019, censusdata.censusgeo([('county', '*')]), datafields)
censusdata.export.exportcsv('censusdata-api.csv', result)

(The datafields are all the variables/values related to B18108)

The data itself is not incorrect, but the values seem to corresponds to the wrong location/county. The first couple rows are correct, but then the subsequent rows are not. One example: the data for Los Angeles county obtained from the API matches the Napa County from the downloaded table. Rock County, WI data from API matches to the Scioto County, OH.

@steventrev
Copy link

steventrev commented May 16, 2022

Can you post a full example? Here's my own that shows B18108_001E is identical to a csv export from data.census.gov at the county level. The tables are not indexed identically (and you should not assume so) but they are not inaccurate.

import pandas as pd
import censusdata as cd

datafields = ['B18108_001E']
result = cd.download('acs1', 2019, cd.censusgeo([('county', '*')]), datafields)
cd.export.exportcsv('censusdata-api.csv', result)
dfcd = pd.read_csv('censusdata-api.csv')
#dfcd.shape #(840, 4)

#Downloaded B18108 table from 2019 ACS1 via https://data.census.gov/cedsci/table?q=B18108%3A%20AGE%20BY%20NUMBER%20OF%20DISABILITIES&g=0100000US%240500000&tid=ACSDT1Y2019.B18108
dfacs = pd.read_csv('ACSDT1Y2019.B18108.csv', skiprows=[1])
dfacs = dfacs[['B18108_001E', 'NAME']]
#df_sub.shape #(840, 2)

df = dfacs.merge(dfcd, on='NAME', suffixes=['_acs', '_cd'])
df['B18108_001E_acs'].equals(df['B18108_001E_cd']) #True

@datatalking
Copy link

@steventrev thanks for replying. Now that @jtleider has left are we keeping the data current or is there a list of bugs, features, documentation or other that need doing?

@steventrev
Copy link

@datatalking - this package and its documentation continue to work presently. The package can support 2020 data by adding new tables to the /censusdata/variables/ path, which many forks (including my own) have done. I'm a greenhorn in this space, but will support where I can.

@datatalking
Copy link

@steventrev are you supporting the censusdata package going forward from your repo, if so I'd like to help collaborate. I'm green to the census data package but have used the data within for years. Hopefully my python and other skills can be of use, I see this package as worth (some) maintaining.

@steventrev
Copy link

steventrev commented Sep 8, 2022

@datatalking I doubt my capability beyond my refresh of the input files. Would a better course of action be to request the reins from @jtleider?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants