Library Mashup Success Story: Using importXML to scrape the OPAC

The following is a guest post from Kim Gormley.

I’m a librarian at the West Chester Public Library who’s just beginning to teach myself some programming, inspired by the projects in Library Mashups and elsewhere. Thank you for compiling such a great resource – I never thought I would be excited about coding!

I’d like to share a beginner-level trick that has helped us tremendously: using the importXML feature in Google Spreadsheets to make a dynamic list of required summer reading books that are on the shelf right now at WCPL. Does it still count as a mashup if we’re only using one data source? I’ll assume it does and plow on ahead. 🙂

The Problem: Most kids coming in for summer reading books just want to know which titles we have now and how to get their hands on them as quickly as possible. Unfortunately, our OPAC doesn’t group all editions of a work or allow you to sort by shelf status, so the search takes ages, even if you’re a whiz at using the catalog. Which you’re not, because no one is.

The Solution:

Yes, there are probably much better ways to do this (Ajax? JavaScript?). It’s a kludge, but a surprisingly simple and effective kludge. The spreadsheet recovers quickly from its occasional loading problems, and doing 200+ imports every two minutes hasn’t hurt our server. For a non-programmer newbie (like me) it’s an easy win. You can set up a list in a matter of days, bask in the gratitude of patrons and librarians alike, and do it all with just three Spreadsheet formulas.

Other possible uses: “Reading Olympics” books, princess books (we get that one all the time), new music CDs, new DVDs, bestsellers, Oprah’s picks, etc.

Here’s a bit of the cheat sheet I made for my coworkers, just in case you want it. It has the three formulas.

1. An import function looking for any copies of 1984 that are on our shelves

=importXml(“,1,1,B/holdings&FF=Xt%3A%281984%29+and+a%3A%28orwell%29&1,1,”&”?workaround=”&INT(NOW()*1E3)&REPT(GoogleFinance(“GOOG”);0);”//td[contains(.,’ON SHELF’) or contains(.,’Recent’)]/ancestor::tr/td[contains(.,’WC’)]/following-sibling::td[1]”)

You will have to repeat this formula for each edition with a separate catalog record. Group all of these imports into one block so that you can easily use…

2. A formula to grab the import information from #1 and smoosh all the unique call numbers neatly into one cell

=arrayformula(concatenate(unique(substitute(D2:E20;”field v”;CHAR(13)))))

(I imported the field elements in #1 so I could transform them into carriage returns here.) Google limits you to 50 import functions per spreadsheet, so large lists have to be chunked into several “feeder” documents, which are then meta-scraped by the final document. Formulas #1 and #2 are in a feeder document.

Put all the #2 formula cells into one column so that you can import the whole range into the final document using…

3. A final import formula to get the call number data from the feeder

=arrayformula(if(importrange(“0AvLMkghcvzfedDFiTVFIVmFfd24wbF9yY3BpWndaNUE&hl=en”&”?workaround=”&INT(NOW()*1E3)&REPT(GoogleFinance(“GOOG”);0);”Feed!B2:B41″)=”#VALUE!”;”Not on Shelf”;if(importrange(“0AvLMkghcvzfedDFiTVFIVmFfd24wbF9yY3BpWndaNUE&hl=en”&”?workaround=”&INT(NOW()*1E3)&REPT(GoogleFinance(“GOOG”);0);”Feed!B2:B41″)=””;”Not on Shelf”;concat(concat(“On Shelf:”,CHAR(13)),importrange(“0AvLMkghcvzfedDFiTVFIVmFfd24wbF9yY3BpWndaNUE&hl=en”&”?workaround=”&INT(NOW()*1E3)&REPT(GoogleFinance(“GOOG”);0);”Feed!B2:B41″)))))

The feeder document contained 40 titles, with the #2 formula cells collected in B2:B41. Titles not on the shelf could result in either a blank cell or a VALUE! error, so I had to account for both possibilities.

That’s it. Just copy and paste the formulas, changing the URLs and data ranges as necessary.

One Response

  1. Library Mashups : More Library Mashups

    […] the guest post last week, I got this email about more library mashups! Will Kurt, Applications Development Librarian at […]

Leave a Reply