IMPORTANT NOTE: Most of the analysis done on this webpage can be found in the pdf report here

To get a better understanding of the project, please go to the “Home” tab for a general explanation. Note that tabbed plots have individual explanations for each plot, so the user must click into each tab for its respective interpretation.

Note that these visualizations are part of the detailed analysis that can be found in the pdf report. This page only gives a brief overview of the results.

Introduction

The variables used in this analysis are:

Variable Description
Brand eg. Honda, Audi, etc.
Model eg. Civic, R8, etc.
Model year eg. 2001
Listing price Price on Kijiji, listed by seller(CAD)
Market price A “fair” asking price for a good-condition vehicle, from MotorTrend.com(CAD)
Mileage Miles
Body Type eg. Convertible, Sedan, Trucks, etc.
Wheel Configuration eg. AWD, FWD, etc.
Price Range Low, medium, high. Explained below.

I separated the observations into three price ranges, based on their market price and year. A “low” price range car is below the 25th quantile for price among models in the year the car’s model was made. A “medium” price range car is between the 25th and 75th quantile, and a “high” price range car is above the 75th quantile. This was meant to classify vehicles into normal and luxury vehicles.



Data Preview

Market price visualizations

A problem with this dataset is that the listing data was scraped in 2019, and the market price data was scraped in 2023. Due to time constraints on this project, the listing data is meant to stand in for real-time data.

This implies the market price for the cars has most likely decreased in 2019. However, online sources state that cars typically depreciate exponentially in price. Thus, we have reason to believe that most of the listings differ from their market prices by a constant factor, and can thus proceed with analysis.

Figure 1: Market vs Listing Price

Here, we can check our assumptions about market price, and also gain some insight about the approximate change in car prices between 2019 and 2023.

Overall, these plots suggest that car market prices decay exponentially by year(at a rate proportional to their original price), and most cars have similar decay rates. This suggests that vehicles are not a good long-term investment, and that buyers should typically consider cheaper cars if they want to buy a vehicle for everyday use. Also, this suggests that buyers could find older models of luxury vehicles for relatively cheap prices.

Market price vs Listing price

I fitted a line using least median squares regression, and got the equation

Listing price(2019) = -2663.14 + 1.8 * Market price(2023)

The model suggests perceived value of most used cars has almost halved since 2019, given the newer models that are being developed and the simple passing of time.

The cone shape of the plot suggests the variance in listing prices is associated with the market price of a car. It is common knowledge that factors affecting price such as depreciation and damage are all scaled to some base price of a new, accident-free vehicle(eg. an expensive vehicle would have expensive repairs, and the value would decay proportional to the original price).

However, does this proportion change for cars of different approximate price ranges?

Log Scale Market vs Listing Price

On the log scale, we see relatively constant variance. This suggests that the ratio of listing price to market price is somewhat constant across cars of all price ranges.

We have established a fact that price changes in listings seem proportional to a car’s initial price, which is correlated with its estimated market price. Further factors must be considered if we want to explain remaining variance in the listing prices.

Figure 2: Logged market price by year and brand

The following plots allow us to visualize and compare the changes in price for different brands of cars over time, and inspect the data points individually to understand their respective trends better. I chose to plot the market price variable as it is not influenced by individual perceptions of value, as listing price may be. I also chose to plot market price on the log scales and fit linear lines to model the exponential nature of the price decay.

Thus, for each brand, we fit a straight line \(price = e^{\beta_0 + \beta_1 year}\), meaning the slope estimates roughly correspond to annual rate of change in price(the rate is estimated to be \(e^{\beta_1}\).

Overall, these plots suggest that there isn’t a significant difference in depreciation between different brands of cars. However, since within each brand we have observations for different models, the results are heavily affected by outliers and a lack of data points, and further investigation is necessary to confirm these findings.

Depreciation rate definitely differs between vehicles and is an important consideration when making a purchase, but these plots suggest that it does not differ too much between brands, and consumers should feel free to choose brands that they enjoy.

Coupe, Sedans, Convertibles, etc.

Below are the fitted slope coefficients for each line. They can be interpreted using the model given in the description above. For example, the model suggests brands such as Porsche, Acura, Toyota and Honda have the lowest yearly rate of change, and that brands such as Cadillac, Audi, Dodge and BMW have the highest.

This may be because brands such as Acura, Toyota and Honda are known to have low maintenance costs and encounter few mechanical problems in the car’s lifespan, whereas brands such as Audi and BMW typically have high maintenance costs. However, the amount of noise and outliers in the data makes it difficult to draw definitive conclusions.

Trucks, Vans etc.

Below are the fitted slope coefficients for each line. These do not give much insight, as most coefficients are similar and estimates are noisy.

The plot for larger vehicles, such as trucks and vans, seems to have a lot less noise, as most lines are almost parallel. This may be because there is a lot more variation among smaller vehicles(eg. sports cars, luxury vehicles) compared to vans and trucks. The parallel lines in this plot suggest that price decay rates are similar among trucks, for different brands of cars.

Compared to the previous plot, we see similar cars among the highest/lowest fitted slope coefficients(eg. Toyota, Acura among the cars with lowest slope, BMW and Audi among cars with highest slope), giving evidence to support our findings.

Other

This plot does not really give any results, it just allows us to view the cars that were classified as “Other.” We see that most vehicles may have been misclassified, while some vehicles were correctly classified, upon inspecting its make and model. Perhaps sellers did not declare the vehicle type on the site.

Figure 3: Comparing other car factors

When considering a car purchase, knowing approximately what is available within a certain price range is important. Analyzing the price distributions of cars with different body types and wheel configurations can help us understand what cars are available at different budget points, a useful consideration when a specific function is required(eg. a truck for transporting equipment, or an all wheel drive vehicle for snowy weather).

Listing price vs body type

We see that cars with different body types have varying listing price distributions. This makes intuitive sense, as larger vehicles will probably have higher prices compared to smaller cars. We see that pickup trucks, convertibles, and coupes have the highest price.

Overall, this suggests that body type may associated with pricing, and may be a good indicator for predicting car price.

By wheel configuration

In addition to body type, wheel configuration also seems to be associated with price. We see 4x4 vehicles have the highest median price.

Figure 4: Comparing market price decay with miscellaneous factors

Our results suggest that cars that provide utility, such as trucks, may observe lower decay rates compared to other, more “generic” vehicles. In addition, luxury vehicles and sports cars, which are often rear wheel drive, may observe lower decay rates due to the social status associated with the vehicle.

However, price decay is still quite similar between different vehicle classes, and the data has a lot of noise.

Logged market price vs year, wheel configuration

Here, the slope of the line indicates the approximate rate of exponential decay in price. We see that 4x4 and RWD vehicles have a lower slope compared to AWD and FWD vehicles, suggesting they have lower decay rates.

This may be because most 4x4 vehicles are trucks, that provide a lot of utility, and thus have a lower price decay rate. In addition, many rear wheel drive vehicles are sports cars and other luxury vehicles, and these may observe a lower decay rate due to the status associated with the vehicles.

Numerical statistics

Due to the noise in the other observations, I decided to use least median squares regression to quantify my findings, as it is less sensitive to outliers.

Note that this fits a line \(Y = exp(\beta_0 + \beta_1 X)\), where \(Y\) is the listing price of a car and \(X\) is the year. Note that a higher slope(\(\beta_1\)) value suggests a higher decay rate, projecting a higher increase in price associated with increasing years.

Here, we confirm our previous findings that rear-wheel-drive and 4x4 vehicles have the lowest slope estimate and thus the lowest decay rate.

Here we see that Coupes and pickup trucks have the lowest slope estimate, which supports our previous conclusion. However, Coupe has a negative coefficient which may be the result of outliers in the data, and we see that convertibles actually have the highest slope estimate.

Overall, there is some evidence in support of our analysis in this section, but the size and noise in the dataset makes it difficult to draw any definitive conclusions.