This document covers some of the issues associated with first-time development environment setup and with collaboration using Git.
Creating a GitHub account
If you do not have an account already, go to GitHub and sign up for an account.
Please refer to the installation guide according to your operating system to install Git.
Once you have forked the code and have begun contribution, syncing your fork periodically with the main City Bureau repository will be useful in staying up-to-date with the project.
- You must first add a remote link from which Git can track the main City Bureau project. The remote URL is
<https://github.com/City-Bureau/city-scrapers.git>. Conventionally we name this remote source
upstream. The remote source for your original cloned repository is usually named
$ git remote add upstream https://github.com/City-Bureau/city-scrapers.git
You can see your existing remotes as well by running
git remote -v.
- Once you’ve added the City Bureau remote, fetch the changes from upstream
$ git fetch upstream
- Make sure you are in the branch you hope to merge changes into (typically your
masterbranch), then merge the changes in from the
$ git checkout master $ git merge upstream/master
- The final step is to update your fork on Github with the changes from the original repository by running
Creating a virtual environment
The following gist covers common headaches with setting up a virtual environment on a Linux-like environment.
It is also possible to use
venv to create your virtual environment.
$ python3.6 -m venv venv $ source venv/bin/activate
Here we are naming the virtual environment
venv, which has been added to the project’s
Getting Google API credentials
The system has a few scrapers that use the Google Sheets API to pull in data from manually updated spreadsheets. If you want to run or test these scrapers, you’ll need to get an API key.
If you need an API key for Google Sheets, you can get one for free by:
- Logging into the Google API console, choosing “Enabled APIs and services”, and then searching for “Sheets API”, selecting “Google Sheets API”, and then clicking “ENABLE”.
- Then, in the left sidebar, choose “Credentials” and then “CREATE CREDENTIALS” -> “API Key”. You will be shown a key that you can save somewhere safe.
You’ll need to set this as an environment variable before running the new scraper. An easy way to do this is to just put it on the command line like so:
$ CITY_SCRAPERS_GOOGLE_API_KEY=TheTokenYouCreatedAbove scrapy crawl localschoolcouncil