Research Day Blog

Keeping track of my research day activities!

December 18, 2019

I’m looking forward to being a reviewer for Against the Grain, and have finally received my first book in the mail!

Without spoiling the review, I’m enjoying the book overall, and have found it to be insightful when considering larger questions around professionalization, working conditions, and my very recent past as an LIS student.

November 1, 2019

Rachel and I continued to work away preparing for PyCon Canada. I really should have kept better notes, but the day just flew by. I was so excited – and nervous – for PyCon Canada that I really didn’t have time to relax until after it was all done. That said, I’m looking forward to coming back to the linked data work we were doing, but starting from scractch and creating the dataset ourselves.

Because PyCon Canada fell on a weekend I didn’t take any research days for it. However, I thought I’d include it in this post because it was what Rachel and I were working towards. At PyCon, I mostly attended sessions in the PyData track, and had the opportunity to meet a lot of people who were not librarians, which I always find to be a rich way to spend my time. I worry often that as librarians we often make things harder for ourselves than they have to be, when people in other disciplines and professions are doing similar work, but we just don’t know about it.

I enjoyed learning about word2vec, in a presentation given by Roberto Rocha (a data journalist at CBC). Word2vec is used to determine document similarity, and in this case was presented in 3D through tensor. Leaving the session I couldn’t help but think of the interesting potential word2vec presents, maybe analyzing online reference questions…

Another thing that I enjoyed learning about, which was presented in the lightning talks, is gazpacho. Gazpacho is meant to be a simpler, easier-to-use replacement for (most) things that are currently done with beautifulsoup.

I also was grateful for the knowledge shared by Niharika Krishnan (Understanding Autistic Children Using Python), Stephen Childs (Data Viz with Altair), Serena Peruzzo (Data Cleaning), Jill Cates (Algorithmic Bias), Manav Mehra (Growing Plants with Python), Cesar Osario (Voice recognition using python and deep learning), Anuj Menta, and Josh Reed (Putting your data in a box). In addition, the conversations and connections I made with other people who are interested in tech and libraries was so valuable!


October 16, 2019

In preparation for a talk that Rachel Wang and I will be giving at Pycon Canada on November 17, 2019, I took my second research day. We’ll be presenting on gathering insights from linked data, using RDFlib and SPARQL queries.

What did we get up to?

  1. Explored some graphing databases – Neo4j and GraphDB. While both had great user interfaces, we decided to focus our efforts on GraphDB because it works well with RDF, and therefore RDFlib, whereas Neo4j uses Labelled Property Graphs. More information on this difference can be found on the Neo4j blog. GraphDB also has a tabular data to RDF loader, which is built off of OpenRefine, a tool we’re both already familiar with and hope to use more (there will be more on that later!).
  2. Explored loading data into GraphDB both through the RDF options and Tabular (OntoRefine) option. The Ontorefine option kept giving us problems – no matter the size of the file, it seemed to continue to load forever, or error out. This was frustrating because it was even the sample file provided by GraphDB to test this type of load.
  3. Developed SPARQL queries.

Below is a screenshot of the GraphDB interface with our sample wine dataset we were testing on.

July 30, 2019

For my first-ever research day, I decided to focus on skills that would be directly applicable in my current role. It wasn’t hard to decide that it would be invaluable to focus on developing skills with the system API that I work with nearly daily, but that I lack experience with, and a conceptual understanding of. I worked my way through the training manual, though I’m certainly left with some lingering questions. For example, some of the arguments seem not to be present in the manual, and some thoughts about constructing scripts using a combination of unix commands and local tools need some clarification.

Some of the things I learnt during my research day were:

  • the way our databases are structured and which keys are used where
  • server configuration (specifically, what does this Unicorn directory I keep using mean)
  • what are Unix commands and the Bourne and Bash shells?

I now feel like I have a better basic understanding of how the API functions and which tools to use for which types of problems. However, I wish I would have a better understanding of exactly how to write some of the commands, because I have a sneaking feeling that many options were missing in the training manual. I think this because there are some scripts that I use, which were created before my time, that have inputs or outputs not defined in the manual. When I get back to work I’ll see if I can find the definitions in the API using another shortcut I learnt about “-x” to list tool, input and output options, and if not I’ll ask my supervisor for assistance.

All of this was made even better by the view! I took the opportunity to work at one of my local public libraries – the Hillsburgh Public Library. Recently opened as architecturally part heritage house, the library is on the water and has an exceptional view and beautiful collection, and the commute was 10 minutes instead of approximately 2 hours.