Is the movie or book better?

“THE BOOK WAS BETTER THAN THE MOVIE” – It’s a phrase we hear too often, frequently uttered by diehard fans of books or series of books that are then disappointed by their big-screen counterpart. In recent times, studios have taken a backseat to writing new stories and have allowed established authors to take the spotlight, using previously-written novels and stories as basis for motion pictures. But is this always the smart choice? As movies have fanbases, so do novels. And some of these fans will go to great lengths to defend their favorites. Sometimes movie adaptations will leave book fans saying the aforementioned phrase: “the book was better than the movie." The Data Dugongs have decided to put this phrase to the test, by comparing the Goodreads ratings of books to the IMDB ratings of their movie counterparts.


The Data Dugongs are made up of three members: the writer - Allison Brekosky, a junior communication major at Pitt; the designer - Lliana Hwang, a senior Information Science major; and the coder - Steven Barash, a junior Information Science major. Together, our team works to provide accurate and interesting datasets.


Our inspiration came from looking at both Goodreads and IMDB ratings charts: they both have a lot of data, valuable in everyday life to users looking to pick out their next source of entertainment. Our main idea came from two already-existing datasets from Goodreads.


The first list is literally titled “The Book was Better than the Movie.” In this dataset, a list of books are ranked, from “greatest book in comparison to worst movie” to “worst book in comparison to greatest movie.”


The second list is the opposite of the claim, and is titled as such: “The Movie was Better than the Book.” The entries were listed, and they were ranked on movie upvotes.


The lists together contain 2,511 books, however, when repeating offenders appeared, and we narrowed it down to well-known titles, we are drawing from 231 movie/book combinations to make our datasets.


Our data was gathered by a program we wrote using C# that pulls data from the IMDB API based on the tag “based on novel or book”. This API outputted a JSON file which was then serialized into an array of C# objects. Each object was filtered through a function to only extract the information that we needed for this project. We decided this information would be the film’s title, year, genre, user rating, and a plot summary. At first, the API outputted tens of thousands of lines of data, returning any film that was based on a novel regardless of whether it was an indie film or an international film. Unfortunately, the computer would freeze up anytime this data was being processed. Luckily the API had a “popularity” value, which we increased in order for to filter out less popular movies. After this, we were left with over 230 of the most popular films based on books. We then wrote a function that outputted this data into a CSV file which would then be plugged into our graphing software. Initially we planned to search the Goodreads API using our existing list of film titles, and search for the books of the same title to complete our dataset, but we ran into two problems:

  1. Not all films based on a book share the same name as said book
  2. There is some overlap with titles, whether there are multiple books of the same title, or a graphic novel or children’s book based on the film or book that share the same titles.

As a result, the API wasn’t giving us consistent results, and we wanted to start visualizing our data as soon as possible so we decided to gather the rest of the data the old-fashioned way… Each of us split the task of manually searching Goodreads for data, finding each movie’s corresponding book rating and year of publishing. All of this data was eventually consolidated into a spreadsheet from which we based the following visualizations.


Our datasets will be important, as it will not only answer the age-old question, but also it will see if the older medium of books holds up against the newer medium of films. Since the creation of mediums that incorporate sight and sound together (films, television), other printed and spoken mediums have lost their uniqueness. As technology evolves, so do our expectations and opinions on what mediums are the most fascinating. Radio triumphs over print, television triumphs over radio…do film adaptations triumph over books?

Social Media Polling



Before analyzing the ratings of thousands of book and movie combinations, The Data Dugongs first wanted to see how the famous phrase held up on a small-scale -- In general, is the book REALLY better than the movie? We took to social media to find out.

Social media, as of recent, has become an amazing medium for polling. Sites such as Facebook and Twitter have incorporated polling features into their status updates, making it easier than ever for users to generate and ask their followers questions. Social Polling, as it has been named in the media, is the fusion of opinion polling and social media in an attempt to gauge opinions on a influencer-to-followers basis. While social polling has been used more in branding as of recent (Econsultancy.com), we used this to our advantage and, using Instagram and Twitter, polled our personal followers to see how books held up against movies.

After the polling window closed, we had just shy of 100 users, in total, respond to our polls (65 on Instagram, 31 on Twitter) As seen above, books won out over their movie counterparts. While this was only a general survey, it seems the peers we surround ourselves with would much rather read a novel than watch a movie. But let's look bigger...

Goodreads vs. IMDB ratings



Goodreads and IMDB are overall hubs for book-reading and movie-watching, respectively. Registered users of the sites can contribute their own synopses, comments, and, in this case, ratings to titles they have read/seen. Goodreads and IMDB, you can say, are the "wikipedias of entertainment" - as all of the site's content is user-submitted. For ratings, Goodreads and IMDB both use "star-rating" systems - where users rank content and then the ratings are averaged together. IMDB's rating system is based on 1 to 10 stars, while Goodreads is based on 1 to 5 stars - therefore we doubled Goodreads' average rating to make the above graph.

As previously said, The Data Dugongs have been looking at 231 entries in an attempt to solve the age-old rivalry between book and movie. When we put averaged the ratings of the 231 films and the 231 books they are based off of, the rating for the books overall triumphs over the overall rating for movies by 1.03 stars.

This data supports the aforementioned statement - "the book is better than the movie." When looking at user-rated systems, it shows that the data is closer than we previously predicted. This was also supported by the social media polling. After posting a general poll, The Data Dugongs received quite a few messages:

"It depends," one user wrote.

"Book, but sometimes it's easier to watch the movie then read the book"

"I find the benefits of both"

While the general question "book or movie?" varies from person to person, for the second time in a row: book wins. But how do big time Hollywood directors choose what books, or what book genres they make into major motion pictures? Well there's a recurring pattern all throughout Hollywood...

Movie to Film Adaptations (by Genre)



Some of the most-anticipated movies of 2019 will be followed by a credit that reads "Based on the book..." -- Five Feet Apart, Pet Cemetery, and even It: Chapter 2 will all hit the big-screen this year. While novel adaptations are becoming more and more popular in Hollywood, that doesn’t mean that they will make just any book into a movie nowadays. While they are not writing their own original stories, big Hollywood directors know what does and doesn't work in the realm of adaptations. The Data Dugongs took our list of 231 films that are based on books, and plotted them as seen above, based on GENRE. We found that when plotted on the same graph, the data was too close to create a narrative based on that visualization alone, so we split the data into 3 different sections, the 5 largest categories, 6 categories in the middle, and the smallest 6 categories. This helped us craft a more meaningful visualization. In the box-office recently, and as reflected by our findings, there seems to be a "Big 5" book genres that are then adapted into movies. Romance makes up the bulk of the top 5 with 24% (thanks to films such The Fault in Our Stars and Fifty Shades of Grey), followed by Fantasy (21%), Action (20%), Science Fiction (19%), and Drama/Thrillers (16%). There is a method to their madness, as these genres continuously succeed in the box office, spawning sequels and large amounts of revenue. In fact, according to recent statistics, action and drama films are some of the top grossing film genres overall, which may explain their dominance in our findings. But, let's look closer...

Book vs. Movie vs. Genre



Note: resize/zoom out to see full picture

As shown in the previous graph, the movie industry invests more in romance adaptations than in the non-fiction category. But according to the Goodreads and IMDB dataset, the highest and lowest rated genres are:

(Book)
Highest: Adventure
Lowest: Neo-noir
(Movie)
Highest: Biography
Lowest: Historical

It’s possible the results may be skewed due to large differences in genre popularity. Some genres only had less than 5 listings, while others had a more ample listing of 40 and above. To gain a more accurate reading, the genres are split again into 3 popularity levels as previously done.

Top (Book)
Highest: Drama
Lowest: Romance


Mid
Highest: Adventure
Lowest: Comedy


Low
Highest: Non-Fiction
Lowest: Neo-Noir
Top (Movie)
Highest: Thriller
Lowest: Action


Mid
Highest: Adventure/Fiction
Lowest: War


Low
Highest: Biography
Lowest: Historical

One interesting pattern to note is the differences between each average rating is very miniscule, so users seem to enjoy any work as long as it meets or exceeds quality storytelling. Almost the same exact differences between books and movie ratings and general (as seen in Graph 2) are seen in the genre breakdown above. No book vs. movie rating in any genre stands more than 1.5 stars away from one another. In general, the average ratings between each movie or book genre are very close, which means there isn’t any overwhelming favoritism for a certain genre. The only favoritism shown is how the movie industry chooses which book to adapt.

“It’s all about managing risk for the studios,” Hawk Otsby, co-writer of Children of Men and producer on Syfy’s The Expanse, explained in an email to The Verge. “It’s extremely difficult to sell a blockbuster original script today if isn’t based on some popular or recognizable material… Audiences know the story, so they’re sort of pre-sold on it. In other words, it has a recognizable [intellectual property] and can rise above the noise [and] competition from the internet, video games, and Netflix (theverge.com).”

So, the age-old question has been answered. Overall, the data consistently shows users rate books at a higher degree than with movies. But why are books better than movies? What gives them the higher rating? It’s possible books receive a higher rating because more time is spent on it, it provides more details than the movie, or biases of each user based on their character. It’s up to each individual user to decide for themselves because there are too many factors and aspects to consider when comparing each line of work. The biggest issue movies have to deal with is time constraint, forcing them to cut back on a lot of world and character development. In the end, they are their own different medium that has its own set of limitations with storytelling.