THE CONCEPT (abridged version, taken from the MAT 259 course website): In any database, there lies hidden knowledge. What does a database contain, and what can MySQL queries reveal? Your first assignment is to find something of interest based on your own cultural / knowledge interests. Here are some options:
-
Topics of Cultural Interest
-
The Database Organizational Structure <- what I'll be looking at
-
Data Analytics Query Methods
- Use the MySQL Workbench to write a query by which to retrieve the data from the SPL database
- Use the spl_2016 database which gets updated daily.
Do a report on your research, and propose what to visualize based on time, or volume of activity. Possibly make a comparison between book(s) and movie(s) and soundtracks (cd).
Once you have all the material - click on "POST REPLY" to this link and add your info to complete the assignment.
I'm interested in knowing how maps are catalogued at the Seattle Public Library (SPL) in 2016. Maps create an interesting challenge for library and database organizers, becuase they exist beyond just the medium of text, but as images or collections of vector-based features. My research questions try to explore how the library organizes maps, and the structure by which they are made accessible.
Why am I asking this? It would be helpful to first know how many maps are available at SPL before knowing which ones are checked out, and before performing any analyses of cultural interest. This will be helpful for later visualizations that require normalization.
SELECT COUNT(barcode)
FROM spl_2016.inraw
WHERE itemtype LIKE '%map';
287
Since the barcode
value is distinct, we know that there are 287 entries in the database that are designated as "maps."
** Why am I asking this? ** The map collections can be an indication of the coverage of the globe by subject area. Collections with more maps may imply increased interest in that particular area, or perhaps, a greater diversity of map representation. For example, areas with more "things" will have more maps related to each of those "things," such hiking, camping, animal activity, etc.
SELECT COUNT(DISTINCT collcode)
FROM spl_2016.inraw
WHERE itemtype LIKE '%map';
8
There are 8 different types of map collections at SPL. What are they?
SELECT DISTINCT collcode
FROM spl_2016.inraw
WHERE itemtype LIKE '%map';
collcode |
---|
cadocpt |
camap |
camapr |
camus |
canf |
caref |
naatlr |
nanf |
Follow-up question... what do these collection designations mean?
We know from this link on the course website that collcode:
This is a string of characters that encodes several data for each item, including the physical home (aka branch), collection type, and collection name.
But it's unclear what the names of the collections actually mean. We may have to do additional exploraiton on SPL's website. I'm assuming that "map" is just that, and "ref" is just that as well, but otherwise, I couldn't find much information on the collection metadata, and since the current SPL website is for 2018, I didn't much digging to see what these retroactive designations meant.
** Why am I asking this? ** We know that maps are generally non-fiction, however, they aren't read like texts are. If the maps do have Dewey class values, what are they?
SELECT inraw.bibNumber, deweyClass.deweyClass, inraw.subj
FROM spl_2016.inraw, spl_2016.deweyClass
WHERE inraw.bibNumber = deweyClass.bibNumber AND
inraw.itemtype LIKE '%map'
ORDER BY deweyClass DESC;
bibNumber | deweyClass | subj |
---|---|---|
1848209 | 979 | NULL |
1757400 | 912 | NULL |
1757400 | 912 | NULL |
1693625 | 911 | NULL |
1693625 | 911 | NULL |
1693625 | 911 | NULL |
1693625 | 911 | NULL |
1693625 | 911 | NULL |
1693625 | 911 | NULL |
558499 | 910 | NULL |
447232 | 784 | NULL |
2659183 | 355 | NULL |
3031147 | 230 | NULL |
2010269 | 224 | NULL |
2694116 | 188 | NULL |
1603624 | 177 | NULL |
1663926 | 173 | NULL |
When we grab just the distinct Dewey classes, we get:
979, 912, 911, 910, 784, 355, 230, 224, 188, 177, 173
If we cross reference these classes with the Dewey classification csv given on the course website, we see that the maps fall under these subjects:
Dewey class | Description |
---|---|
173 | Ethics of family relationships |
177 | Ethics of social relations |
188 | Stoic philosophy |
224 | Prophetic books of Old Testament |
230 | Christianity |
355 | Military science |
784 | Instruments & Instrumental ensembles & their music |
910 | Geography & travel |
911 | Historical geography |
912 | Graphic representations of surface of earth and of extraterrestrial worlds |
979 | Great Basin & Pacific Slope region of United States |
Follow-up question (and sanity check)... is this right? Let's pick a few bibNumbers
(in bold) and check.
SELECT deweyClass.deweyClass, deweyClass.bibNumber, title.title, subject.subject
FROM spl_2016.deweyClass, spl_2016.title, spl_2016.subject, spl_2016.inraw
WHERE inraw.itemtype LIKE '%map' AND
deweyClass.bibNumber = title.bibNumber AND
title.bibNumber = subject.bibNumber AND
subject.bibNumber = inraw.bibNumber AND
deweyClass.bibNumber IN ('1663926', '2694116', '447232')
ORDER BY deweyClass.deweyClass ASC;
Dewey class | bibNumber | title | subject |
---|---|---|---|
173 | 1663926 | Ouachita National Forest pocket guide | Ouachita National Forest Ark and Okla |
188 | 2694116 | Shasta Trinity National Forest California | Outdoor recreation California Shasta National Forest Maps |
188 | 2694116 | Shasta Trinity National Forest California | Outdoor recreation California Trinity National Forest Maps |
188 | 2694116 | Shasta Trinity National Forest California | Shasta National Forest Calif Maps |
188 | 2694116 | Shasta Trinity National Forest California | Trinity National Forest Calif Maps |
784 | 447232 | Sing a song of holidays and seasons home neighborhood and community | Childrens songs |
Not sure why that last one has the subject "Children's songs," so maybe this is an error. Regardless, these entries do indeed seem to be maps!
** Why am I asking this? ** It's apparent from the previous research questions that maps seem like they would be hard to organize. I am curious if most of the maps are physical or digital copies for example, and if so, how they are organized both in storage and from publically accessible locations in the library. RQ1 in particular raises an interesting thought. I used barcode
for that query, which implies these are physical maps. But are there digital maps as well? It's not clear we would see this, as this database has tables like inraw
and outraw
, which implies that this database contains transactions for physical objects, but perhaps not digital entities (like a digitized map.) Regardless, I would like to see how many maps have no callNumber
, so I modify my first query. The following query helps to see what proportion of maps are easily or (not so easily) categorized.
SELECT COUNT(barcode)
FROM spl_2016.inraw
WHERE itemtype LIKE '%map' AND
callNumber = 'UNCAT';
32
Thirty two out of 287 maps have no callNumber
, which is essentially the "address" of the item. But how would one find these in the library?
** Why am I asking this? ** RQ3 makes me wonder what the available maps are even about. We can first figure out how many distinct subjects there are, but it would also be cool to know what are the top 20 subjects.
SELECT COUNT(DISTINCT subject.subject)
FROM spl_2016.subject, spl_2016.inraw
WHERE inraw.itemtype LIKE '%map' AND
subject.bibNumber = inraw.bibNumber;
277
There 277 different subjects for which the maps are about. What are the top 20 of these?
SELECT subject.subject, COUNT(subject.subject) AS total
FROM spl_2016.subject, spl_2016.inraw
WHERE inraw.itemtype LIKE '%map' AND
subject.bibNumber = inraw.bibNumber
GROUP BY subject.subject
ORDER BY total DESC
LIMIT 20;
rank | subject | total |
---|---|---|
1 | 75 | |
2 | Mount Baker Snoqualmie National Forest Wash Maps | 21 |
3 | Maps | 18 |
4 | Topographic maps | 15 |
5 | Outdoor recreation Washington State Mount Baker Snoqualmie National Forest Maps | 10 |
6 | Endangered species California | 9 |
7 | Endangered species Northwest Pacific | 9 |
8 | Spotted owl | 9 |
9 | Wildlife management California | 9 |
10 | Mount Baker National Forest Wash Maps | 9 |
11 | Wildlife management Northwest Pacific | 9 |
12 | Outdoor recreation Washington State Mount Baker National Forest Maps | 9 |
13 | Outdoor recreation Washington State Snoqualmie National Forest Maps | 9 |
14 | Snoqualmie National Forest Wash Maps | 9 |
15 | Mount Baker Wilderness Wash Maps | 7 |
16 | Noisy Diobsud Wilderness Wash Maps | 7 |
17 | Willamette National Forest Or Maps | 7 |
18 | United States Historical geography Maps | 6 |
19 | Pasayten Wilderness Wash Maps | 5 |
20 | Gifford Pinchot National Forest Wash Maps | 5 |
It's interesting that the top "subject" is no subject at all! And most of the rest of the maps are on Washington state or national forests, though the "Topographic maps" could be about anything.
** Why am I asking this? ** What is absent from the database is just as important as what's in it. The creation of maps also reflect ongoing changes in the political and administrative boundaries of the world. While boundaries may seem stable, they obviously weren't always so, and there are still many regions of the world in which cartographic boundaries are unclear or disputed, and this may either spur more map creation or revision.
We can see what's not being included in the collection of SPL's maps by looking at RQ4 and revisiting the results from RQ3. When I modify the query in RQ4 to include the Dewey class, I am reminded that MOST map listings do not have a Dewey class (and the top 20 certainly don't.) However, by revisiting the results from RQ3 and comparing them to our list of Dewey classifications, we can get a cursory view of which subjects are not being represented by the library.
The below table shows just some of the Dewey classes that were not included when RQ3 was conducted.
Dewey class | Description |
---|---|
914 | Geography of & travel in Europe |
915 | Geography of & travel in Asia |
916 | Geography of & travel in Africa |
917 | Geography of & travel in North America |
918 | Geography of & travel in South America |
919 | Geography of & travel in *Australasia, Pacific Ocean islands, Atlantic Ocean islands, Arctic islands, Antarctica, & on extraterrestrial worlds |
And basically the rest of the 900's could be encompassed within a map classification.
Also, there are no maps about extraterrestrial worlds? That's too bad...
Finding geographic information is tricky. A lot of geographic information systems (GIS) research is being conducted in geographic information retrieval and search. While it's not as easy as it seems, visualization is increasingly being used to overcome some of the challenges of finding, exploring and analyzing geographic data. Most physical and natural processes transcend political boundaries, and yet, this is where most data collection takes place. Even so, most maps that are available exist at a sub-administrative scale with a certain subject in mind, such as recreation (hiking in Washington) or tracking animal populations (mapping endangered species). While the 2016 SPL cataloguing of maps seems decent, there is room for improvement (an opportunity for GIS practitioners!)
In future work I would like to start looking more at the cultural interest of the available maps. I could compare what is in the database to what is actually being checked out. I would also love to explore which physical maps are being looked at versus digital copies of maps online. Furthermore, I think word embeddings offer a great opporunity to group the textual aspect of maps in a meaningful way, and see if this can somehow help improve the process of geographic information retrieval.