Goodreads: Get a User's list of books

The Goodreads API suffers from good documentation and lack of features. Currently, it is not possible for a user to get a list of their saved books. Furthermore, when you navigate to the section My Books, you need to keep scrolling to the end of the page until all books are loaded in the table. I wrote a small script in python, using selenium, beautifulSoup, and pandas to generate a csv file that contains all my saved books.

Technologies used:

  • Python
  • Selenium
  • BeautifulSoup
  • Pandas

Prerequisites

  1. Download the Selenium driver (in my code I am using Chrome, but you can easily switch to Firefox)
  2. When Selenium opens your profile, a pop-up will show up asking for your login credentials. The code will ignore the pop-up by refreshing the page, however in order to see the list of your books either stay logged-in in Chrome or make your profile public.

Understanding the Script

Start by importing the necessary libraries.

1
2
3
4
from selenium import webdriver
import time
from bs4 import BeautifulSoup
import pandas as pd

Initialize the webdriver element on the chrome browser and specify the url you are trying to access.

1
2
driver = webdriver.Chrome()
driver.get('https://www.goodreads.com/review/list/63365681?ref=nav_mybooks')

Wait for pop-up to show up (2 seconds) then refresh the page.

1
2
time.sleep(2)
driver.refresh()

The goal is to stop scrolling when the table has loaded all your saved books. We can do this by accessing the HTML element at the bottom of the page which indicated the number of books loaded: 60 of 995 loaded. We strip the string from any leading or trailing whitespaces, then we split the string on spaces. The first element of this list is the current number of books loaded. The third element is the maximum number of books in our list.

1
2
3
s = driver.find_element_by_id('infiniteStatus').text.strip()
current = int(s.split()[0])
maxx = int(s.split()[2])

Next we loop until the end is reached and the table is filled with all books.

1
2
3
4
5
6
while (current < maxx):
element = driver.find_element_by_class_name("responsiveSiteFooter__heading")
driver.execute_script("arguments[0].scrollIntoView();", element)
time.sleep(2)
s = driver.find_element_by_id('infiniteStatus').text.strip()
current = int(s.split()[0])

Once the page is loaded, we get the source code and search for a table with the id ‘books’. Pandas has a built in feature that converts table tags into a dataframe.

1
2
3
4
html_source = driver.page_source
soup = BeautifulSoup(html_source, 'html.parser')
table = soup.find('table', id='books')
df = pd.read_html(str(table))[0]

We keep two columns (title and author), clean the strings, and write to a csv file.

1
2
3
4
5
df = df[['title','author']]
df['title'] = df['title'].str.replace('title ', '')
df['author'] = df['author'].str.replace('author ', '')

df.to_csv('list_of_books.csv')

The Output

The output is a csv file which contains all titles and authors of the books you have saved on Goodreads.