“THE BOOK WAS BETTER THAN THE MOVIE” – It’s a phrase we hear too often, frequently uttered by diehard fans of books or series of books that are then disappointed by their big-screen counterpart. In recent times, studios have taken a backseat to writing new stories and have allowed established authors to take the spotlight, using previously-written novels and stories as basis for motion pictures. But is this always the smart choice? As movies have fanbases, so do novels. And some of these fans will go to great lengths to defend their favorites. Sometimes movie adaptations will leave book fans saying the aforementioned phrase: “the book was better than the movie." The Data Dugongs have decided to put this phrase to the test, by comparing the Goodreads ratings of books to the IMDB ratings of their movie counterparts.
The Data Dugongs are made up of three members: the writer - Allison Brekosky, a junior communication major at Pitt; the designer - Lliana Hwang, a senior Information Science major; and the coder - Steven Barash, a junior Information Science major. Together, our team works to provide accurate and interesting datasets.
Our inspiration came from looking at both Goodreads and IMDB ratings charts: they both have a lot of data, valuable in everyday life to users looking to pick out their next source of entertainment. Our main idea came from two already-existing datasets from Goodreads.
The first list is literally titled “The Book was Better than the Movie.” In this dataset, a list of books are ranked, from “greatest book in comparison to worst movie” to “worst book in comparison to greatest movie.”
The second list is the opposite of the claim, and is titled as such: “The Movie was Better than the Book.” The entries were listed, and they were ranked on movie upvotes.
The lists together contain 2,511 books, however, when repeating offenders appeared, and we narrowed it down to well-known titles, we are drawing from 231 movie/book combinations to make our datasets.
Our data was gathered by a program we wrote using C# that pulls data from the IMDB API based on the tag “based on novel or book”. This API outputted a JSON file which was then serialized into an array of C# objects. Each object was filtered through a function to only extract the information that we needed for this project. We decided this information would be the film’s title, year, genre, user rating, and a plot summary. At first, the API outputted tens of thousands of lines of data, returning any film that was based on a novel regardless of whether it was an indie film or an international film. Unfortunately, the computer would freeze up anytime this data was being processed. Luckily the API had a “popularity” value, which we increased in order for to filter out less popular movies. After this, we were left with over 230 of the most popular films based on books. We then wrote a function that outputted this data into a CSV file which would then be plugged into our graphing software. Initially we planned to search the Goodreads API using our existing list of film titles, and search for the books of the same title to complete our dataset, but we ran into two problems:
- Not all films based on a book share the same name as said book
- There is some overlap with titles, whether there are multiple books of the same title, or a graphic novel or children’s book based on the film or book that share the same titles.
As a result, the API wasn’t giving us consistent results, and we wanted to start visualizing our data as soon as possible so we decided to gather the rest of the data the old-fashioned way… Each of us split the task of manually searching Goodreads for data, finding each movie’s corresponding book rating and year of publishing. All of this data was eventually consolidated into a spreadsheet from which we based the following visualizations.
Our datasets will be important, as it will not only answer the age-old question, but also it will see if the older medium of books holds up against the newer medium of films. Since the creation of mediums that incorporate sight and sound together (films, television), other printed and spoken mediums have lost their uniqueness. As technology evolves, so do our expectations and opinions on what mediums are the most fascinating. Radio triumphs over print, television triumphs over radio…do film adaptations triumph over books?