Building a scraper for recreation.gov

Published 2018-06-20 on Farid Zakaria's Blog

The start of a new project

A friend has recently asked if I can look into building a tool / site to scrape https://recreation.gov -- with a end-goal of building a system to automatically reserve for a desired permit.

This piqued my interest and lets take a look at what I can do! At a high level I imagined building:
register desired site -> continously scrape -> reserve -> notify via text

Looks like a good chance to put together some interesting technologies: web-framework (django?) & twilio to send notifications

Alternatives

Before beginning any project, I take a look at the current space and see if there are any current open source alternatives or even a paid platform to leverage.

I found the following:

I could not find a paid service and the OSS options seemed very difficult for non technical people to use.

Can I haz API?

Browsing online -- I was ecstatic when I came across ridb.recreation.gov which is a REST API for the recreation.gov website -- unfortunately it doesn't let you perform reservations and I couldn't decipher yet how to link them to the reservation portion. Perhaps it might be leveraged in the future!

Time to use our favorite reverse engineering tools: wireshark & Charles -- I ended up using Charles specifically because I find it easier to setup as a man-in-the-middle HTTPS proxy.

You can follow the simple guide on how to setup Charles as a HTTPS proxy here

Here is the raw request from Charles when searching locations matching whitney at https://www.recreation.gov/unifSearch.do
(unimportant parts stripped out)


POST /unifSearch.do HTTP/1.1
Host: www.recreation.gov
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:60.0) Gecko/20100101 Firefox/60.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://www.recreation.gov/unifSearch.do
Content-Type: application/x-www-form-urlencoded
Content-Length: 275
Connection: keep-alive
Upgrade-Insecure-Requests: 1

currentMaximumWindow=12&locationCriteria=whitney&interest=&locationPosition=&selectedLocationCriteria=&resetAllFilters=true&filtersFormSubmitted=false&glocIndex=0&googleLocations=Whitney+Place+Northwest%2C+Seattle%2C+WA%2C+USA%7C-122.39853319999997%7C47.6974492%7C%7CLOCALITY

The important part is that it is x-www-form-urlencoded with locationCriteria=whitney.

The response is HTML however we can use various tools to strip out the desired list.