useful R code

Some useful R functions I've made. These are unfinished in terms of testing and finessing, but usable and hopefully useful for others.

ACS_import.R is a script to import "Public Use Microdata Sample" (PUMS) from the US Census Bureau's American Community Survey, as outputed as "export.csv" by the MDAT tool (https://data.census.gov/mdat/). See disclaimers within the file.

ccdf.R makes complementary [empirical] cumulative density function plots, the best practice for plotting long-tailed distributions as recommended in Aaron Clauset, Cosma Rohilla Shalizi, and Mark E. J. Newman, "Power-law distributions in empirical data," SIAM Review 51, No. 3 (2009): 661-703. doi: 10.1137/070710111.

igraph_triad_census_plot.R makes triad census dotplots using the actual triad subgraphs as labels, using the igraph package.

network_triad_census.R does the same thing but using the network and sna packages.

ptable.R is a function that makes STATA-like contingency tables, showing both counts and percentages for column marginals, like such:

               1st        2nd        3rd         Total      
Didn't Survive 80 (37%)   97 (52.7%) 372 (75.8%) 549 (61.6%)
Survived       136 (63%)  87 (47.3%) 119 (24.2%) 342 (38.4%)
Total          216 (100%) 184 (100%) 491 (100%)  891 (100%)

as.matrix.table is a one-line function that should really be built-in in R but isn't: it converts a table to a matrix with the same dimensions (and dimension names, and row/column names) as the printout of the table.

> tab <- table(data.frame(lower = sample(letters[1:5], size = 100, replace = T),
+                         upper = sample(LETTERS[1:5], size = 100, replace = T)))
> tab
     upper
lower A B C D E
    a 5 1 4 4 3
    b 2 5 6 5 5
    c 2 4 3 2 5
    d 6 2 4 7 5
    e 3 4 4 3 6
> class(tab)
[1] "table"
> as.matrix.table(tab)
     upper
lower A B C D E
    a 5 1 4 4 3
    b 2 5 6 5 5
    c 2 4 3 2 5
    d 6 2 4 7 5
    e 3 4 4 3 6
> class(as.matrix.table(tab))
[1] "matrix" "array"

duplicates.R contains three functions to get both duplicate observations, and the observation of which they are duplicates (i.e., the intersection of duplicated() and duplicated(fromLast=T), which is also how I actually do it; not the most efficient but it's very lightweight). This is useful, for example, for comparing rows that are duplicated according to some subset of columns, to see why they are not complete duplicates of each other across all columns. duplicates() returns all of a set of duplicate observations, which.duplicates() gives the row indexes, and View.duplicates() opens a View() window of the duplicates for exploration.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
02_Visualization_video.R		02_Visualization_video.R
ACS_import.R		ACS_import.R
LICENSE		LICENSE
README.md		README.md
as.matrix.table.R		as.matrix.table.R
ccdf.R		ccdf.R
ccdf_example.png		ccdf_example.png
duplicates.R		duplicates.R
igraph_triad_census_plot.R		igraph_triad_census_plot.R
igraph_triad_census_plot_example2.png		igraph_triad_census_plot_example2.png
network_triad_census.R		network_triad_census.R
ptable.R		ptable.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

useful R code

About

Releases

Packages

Languages

License

momin-malik/useful-R-code

Folders and files

Latest commit

History

Repository files navigation

useful R code

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages