A 3 295 Euro Renault, with 136 500 km (84 817 miles)
Greetings from Montpellier,
Introduction. Great news: I got a job. Not-so-great news: To get to work each day, I’ll need a car, probably a used one. But there is a silver lining to this new item on my shopping list: an exciting data science project, because I have many questions. Which car brands are most common in France, particularly in Montpellier, and how old are they? How much should I expect to pay? Will it be absolutely necessary to buy a manual transmission car? Most cars here run on gas, right?
I will need data too, so enter Leboncoin.com, Python and BeautfulSoup. Leboncoin is the most popular online service for classifieds in France and comparable to Craigslist in the US.
One can find almost anything on Leboncoin– rental properties, random junk, employment gigs, services, and vehicles. Below, I searched Leboncoin for cars (voiture in French) in Montpellier, and 245 listings that go as far back as April 7th are returned. A click on a listing yields a standardized page including date of listing, price, city, brand, model, model year, mileage, fuel type, and transmission.
Methods. All of the information on those pages constitutes data that could be analyzed to explore the used car market of Montpellier. How might one efficiently extract data from these webpages? Python and BeautifulSoup come to the rescue. Python is a computer scripting language, one that I used for a previous blog post that explored wineries in France. BeautifulSoup is a library of Python code that can be called to pull text and data out of webpages. Recall that at the core of nearly any webpage is Hypertext Markup Language, HTML, and below are sections of HTML from the two pages above.
Do not worry; HTML is not supposed to intelligible to most people, including me. But it can be referenced easily with a little magic and BeautifulSoup. The underlying premise of BeautifulSoup is that text and data in each page have a tag that can referenced; grab the text from the tag of your choice to scrape a webpage. Here’s an application of magic to Leboncoin:
The Python code (above left) imports a few libraries, including BeautifulSoup; calls the webpage with results from the search; pulls the webpage address for each of those results; and scrapes data from each listing with BeautifulSoup. I wrote my script to aggregate the scraped data into a CSV file (above right) that I loaded into another program (RStudio) for statistical analysis. For our purposes, stats applied to the data are descriptive (i.e., means, medians, bar charts, histograms, box plots, etc).
Results. Listings on Leboncoin represented thirty-one car brands. The distribution below shows the top ten most common brands, and Renault, Peugeot and Citroen took the lead. Longue vie à la France!
Median and mean ages for listings with the top ten brands were 10.0 and 10.7 years old, respectively, and the most common brand, Renault, had a median and mean car age of 11.0 and 11.8 years, respectively. We can see the age distribution of listed vehicles:
Vehicle mileage is critical too. Median and mean mileage were 134 700 km and 136 600 km, respectively, for all vehicles in the top ten brands. For Renault, median and mean were 136 500 km and 136 200 km, respectively.
How much should I expect to pay? The median and mean prices for the top ten common brands were 4 990 euros (US$ 5 563) and 6 557 euros (US$ 7 309), respectively. For a Renault, the most common brand, median and mean were 3 295 euros (US$ 3 672) and 4 399 euros (US$ 4 903), respectively.* Notwithstanding, the likely price for a particular vehicle depends on its individual features, wear and use. In a future post, I will develop a regression model for a nuanced prediction of what I might expect to pay. For now, here’s a lay of the land:
Finally, a summary of fuel type and transmission:
Learning to drive a stick would do me well, unless I have a penchant for a Citroen or BMW:
Conclusions. To explore the used car market in Montpellier, France, Leboncoin listings for used cars between April 7th and May 24th were programmatically “scraped.” Data such as brand, model, price, and other features for each listing were analyzed. Of thirty-one brands, descriptive analysis was applied to listings that corresponded to the ten most common brands(n = 195 vehicles), of which the median price was 4 990 euros (US$ 5 563). French Renault, Peugeot, and Citroen constituted the most common brands, totaling 105 vehicles. But a Toyota (n = 9) still is not off the table! This analysis makes clear that more choices are to be had for drivers with a penchant for manual transmissions (n = 166) and diesel engines (n = 137). The price of any used vehicle depends on mileage and other factors, and future analysis should include a regression model to 1) infer significant determinants of price and 2) develop a predictive model of pricing.
*Conversion based on an exchange rate of 1.11 US$ / euro