Working With Nokogiri

I recently worked a bit with Nokogiri to parse some XML. I decided to parse the XML behind the map for the Craftsmanship Manifesto.  The map is here  and the XML behind the map can be found here. I put this on Github, and you can find it here.

I tried parsing the XML the textbook way. bin/ calls lib/first_parser.rb. It’s a mess. It seems like you have to call the Element and the Text classes to get an element. At least that is what I remember and what I can gather from the code. I have a lot of comments in there. I always have comments in code that is just for exploration. But it seems like I have to call two classes to get one element. Just wrong.

I then looked into using XPath. bin/ calls lib/show_parser.rb, which is the example on the Nokogiri site. I was able to parse it with bin/, which calls lib/first_path_parser.rb and bin/ which calls lib/path_parser.rb I had a problem with namespaces. I first tried  doc.remove_namespaces! but I did not like the idea of disabling namespaces. There was no namespace for the document, so I just prepended “xmlns:” to all the element names and I got it to work.

Eventually I decided to try JSON. I found out about a gem called crack which can convert an XML document to JSON. It is in bin/, which calls lib/crack_is_whack.rb. I was able to try it out with bin/, which calls lib/first_json_attempt.rb. It parses a small version of the file. The whole map is parsed and output to csv with bin/, which calls lib/json_parser.rb

JSON is a lot easier than XML.

Image from  Aurora Consurgens, a 15th century manuscript housed at Central Library of Zurich. Image from e-Codices. This image is assumed to be allowed under Fair Use.