Research Day Blog

Keeping track of my research day activities!

October 16, 2019

In preparation for a talk that Rachel Wang and I will be giving at Pycon Canada on November 17, 2019, I took my second research day. We’ll be presenting on gathering insights from linked data, using RDFlib and SPARQL queries.

What did we get up to?

  1. Explored some graphing databases – Neo4j and GraphDB. While both had great user interfaces, we decided to focus our efforts on GraphDB because it works well with RDF, and therefore RDFlib, whereas Neo4j uses Labelled Property Graphs. More information on this difference can be found on the Neo4j blog. GraphDB also has a tabular data to RDF loader, which is built off of OpenRefine, a tool we’re both already familiar with and hope to use more (there will be more on that later!).
  2. Explored loading data into GraphDB both through the RDF options and Tabular (OntoRefine) option. The Ontorefine option kept giving us problems – no matter the size of the file, it seemed to continue to load forever, or error out. This was frustrating because it was even the sample file provided by GraphDB to test this type of load.
  3. Developed SPARQL queries.

Below is a screenshot of the GraphDB interface with our sample wine dataset we were testing on.

July 30, 2019

For my first-ever research day, I decided to focus on skills that would be directly applicable in my current role. It wasn’t hard to decide that it would be invaluable to focus on developing skills with the system API that I work with nearly daily, but that I lack experience with, and a conceptual understanding of. I worked my way through the training manual, though I’m certainly left with some lingering questions. For example, some of the arguments seem not to be present in the manual, and some thoughts about constructing scripts using a combination of unix commands and local tools need some clarification.

Some of the things I learnt during my research day were:

  • the way our databases are structured and which keys are used where
  • server configuration (specifically, what does this Unicorn directory I keep using mean)
  • what are Unix commands and the Bourne and Bash shells?

I now feel like I have a better basic understanding of how the API functions and which tools to use for which types of problems. However, I wish I would have a better understanding of exactly how to write some of the commands, because I have a sneaking feeling that many options were missing in the training manual. I think this because there are some scripts that I use, which were created before my time, that have inputs or outputs not defined in the manual. When I get back to work I’ll see if I can find the definitions in the API using another shortcut I learnt about “-x” to list tool, input and output options, and if not I’ll ask my supervisor for assistance.

All of this was made even better by the view! I took the opportunity to work at one of my local public libraries – the Hillsburgh Public Library. Recently opened as architecturally part heritage house, the library is on the water and has an exceptional view and beautiful collection, and the commute was 10 minutes instead of approximately 2 hours.