{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "29d7cebd-f6db-49df-a6f5-17dd87ad825a", "metadata": {}, "source": [ "# Analyse Exploratoire des Données" ] }, { "cell_type": "markdown", "id": "ac9d1347-c712-4a22-ac2a-b1611b5b6333", "metadata": {}, "source": [ "## 0. Importation des bibliothèques " ] }, { "cell_type": "code", "execution_count": null, "id": "b1a44ede-d6fe-4b7a-91be-cfc9aa7976cd", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "id": "553d905f-d4f5-4dda-90be-9496accf1ca0", "metadata": {}, "source": [ "## 1.\tAnalyse Statistique Descriptive" ] }, { "cell_type": "markdown", "id": "1281aa3b-5e2d-4451-8894-c26483331d1b", "metadata": {}, "source": [ "### 1.1 Statistique de base pour les colonnes numériques" ] }, { "cell_type": "code", "execution_count": null, "id": "852bec5c-924e-41a7-8633-8bc76a03d85d", "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(\"Documents/travaux_pratiques/autos.csv\", encoding = 'Latin-1')\n", "num_values = ['price', 'yearOfRegistration', 'powerPS', 'odometer']\n", "df[num_values].describe()" ] }, { "cell_type": "markdown", "id": "7ff8876c-dc40-4e51-ab5c-00a42c567211", "metadata": {}, "source": [ "### 1.2 Répartition des valeurs uniques pour la marque du voiture " ] }, { "cell_type": "code", "execution_count": null, "id": "dc212545-78e7-49c1-afab-7e31d3f87d00", "metadata": {}, "outputs": [], "source": [ "df['brand'].value_counts()" ] }, { "cell_type": "markdown", "id": "fedbf621-94cf-4879-8321-18418fe3020f", "metadata": {}, "source": [ "### 1.3. Marque de voiture la plus fréquente" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e390669f-b9c6-4413-ad06-64fce3482bec", "metadata": {}, "source": [ "##### La marque de voiture la plus fréquente est : volkswagen.\n", "##### Oui, Il existe des marques moins représentées, tel que : lada, lancia, rover" ] }, { "cell_type": "markdown", "id": "dc4c6951-e411-4699-91c2-38b052faeec5", "metadata": {}, "source": [ "## 2.\tVisualisation des Données" ] }, { "attachments": {}, "cell_type": "markdown", "id": "186d7ee7-e556-43da-8408-30b5b48f4c76", "metadata": {}, "source": [ "## 2.1 Histogramme du prix des véhicules" ] }, { "cell_type": "code", "execution_count": null, "id": "c3990272-96c3-4646-943f-b08d0e94012d", "metadata": {}, "outputs": [], "source": [ "import seaborn as sns\n", "sns.histplot(data= df['price'], bins= 30, kde= True)\n", "plt.title(\"Histogram de prix des voitures\")\n", "plt.xlabel(\"Prix\")\n", "plt.ylabel(\"Frequence\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "324343fe-c951-473b-95ad-6038801963ae", "metadata": {}, "source": [ "### La distribution des données prix est asymétrique (par exemple, très biaisée à droite), cela peut indiquer qu'il existe de nombreuses voitures à bas prix, mais peu à prix élevé." ] }, { "cell_type": "markdown", "id": "e25318ab-815a-4423-81b1-fd6ff058fb41", "metadata": {}, "source": [ "## 2.2 Graphique de barre pour la repartition de vehicule par année de fabrication" ] }, { "cell_type": "code", "execution_count": null, "id": "ce9cfb9f-67a0-41c5-9f60-6a1755700cb8", "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize=[20,20])\n", "year_counts = df['yearOfRegistration'].value_counts().sort_index()\n", "plt.xlabel(\"Année de fabrication\")\n", "plt.ylabel(\"Nombre de vehicule\")\n", "year_counts.plot(kind = 'bar', color= \"skyblue\")\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "e6f91577-5d05-4af8-b798-da55abe93bdc", "metadata": {}, "source": [ "## 2.3 Kilométrage moyenne par marque de voiture" ] }, { "cell_type": "code", "execution_count": null, "id": "ecdc027f-56ca-4b78-b49a-8a0aee1d8cbc", "metadata": {}, "outputs": [], "source": [ "brand_KM_mean = df.groupby('brand')['odometer'].mean().sort_values()\n", "brand_KM_mean.plot(kind = 'bar')\n", "plt.title(\"Nombre de KM moyen pour chaque brand\")\n", "plt.ylabel(\"KM moyen\")\n", "plt.xlabel(\"Brand\")\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "508dd8c9-943e-4b18-bdd1-a4c8cc5443ee", "metadata": {}, "source": [ "- Marque a le kilométrage moyen le plus bas est : trabant\n", "- Marque a le kilométrage moyen le plus élevé est : Saab" ] }, { "cell_type": "markdown", "id": "83638904-7949-446a-a2a2-80950b644992", "metadata": {}, "source": [ "## 3.\tRelation entre les Variables" ] }, { "cell_type": "markdown", "id": "0d64da5e-e8fb-4188-aa7d-9f79298b8315", "metadata": {}, "source": [ "## 3. 1. relation entre le prix et le kilométrage des véhicules" ] }, { "cell_type": "code", "execution_count": null, "id": "38e3f286-4d97-450c-b45b-b9774834c856", "metadata": { "scrolled": true }, "outputs": [], "source": [ "import seaborn as sns\n", "plt.figure(figsize= [20,20])\n", "sns.scatterplot(data = df, x = 'odometer', y = 'price')\n", "plt.title(\"relation entre odometer et price\")\n", "plt.xlabel(\"kilométrage\")\n", "plt.ylabel(\"price\")\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "a29658d1-6fbe-4a66-bf4f-c92753beb10b", "metadata": {}, "source": [ "### On s'attend généralement à une relation négative : les voitures avec plus de kilomètres sont moins chères." ] }, { "cell_type": "markdown", "id": "55b4f36d-b203-4b82-ac81-3d8b55d3f632", "metadata": {}, "source": [ "## 3.2 Relation entre l'année de fabrication et le prix des véhicules" ] }, { "cell_type": "code", "execution_count": null, "id": "10e74dcf-6267-4d88-8aa6-34d5822ede05", "metadata": { "scrolled": true }, "outputs": [], "source": [ "plt.figure(figsize=[50, 10])\n", "sns.boxplot(data = df, x = 'yearOfRegistration', y = 'price')\n", "plt.title(\"price vs année de fabrication\")\n", "plt.xlabel(\"year of registration\")\n", "plt.ylabel('price of vehicle')\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "004ce13f-21f0-4419-b1a7-806de30f3ffd", "metadata": {}, "source": [ "### Les voitures récentes ont souvent des prix plus élevés." ] }, { "cell_type": "markdown", "id": "0d4c2946-c6bd-459a-9354-3b9f055963ee", "metadata": {}, "source": [ "## 3.3 Matrice de corrélation" ] }, { "cell_type": "code", "execution_count": null, "id": "a44c7a7f-16e9-48fd-ba72-73f842533067", "metadata": {}, "outputs": [], "source": [ "correlation_matrix = df.corr()" ] }, { "cell_type": "code", "execution_count": null, "id": "4df20346-1974-4d80-a333-a470618be481", "metadata": { "scrolled": true }, "outputs": [], "source": [ "plt.figure(figsize=(8, 6))\n", "sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=\".2f\")\n", "plt.title('Matrice de corrélation')\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "56d9a440-40fa-45fb-bdbb-f3b7e9e2907c", "metadata": {}, "source": [ "• Interprétation :\n", "- Identifie les variables numériques qui sont fortement corrélées" ] }, { "cell_type": "markdown", "id": "a366365e-1e73-48a5-97a9-ed2f1902a304", "metadata": {}, "source": [ "## Cette matrice indique la force et la direction de la relation linéaire entre les variables. La valeur du coefficient de corrélation varie entre -1 et 1 :\n", "- 1 indique une corrélation positive parfaite.\n", "\n", "- -1 indique une corrélation négative parfaite.\n", "\n", "- 0 indique aucune corrélation linéaire." ] }, { "cell_type": "markdown", "id": "8973b9d0-8958-4c1d-a919-1e0b12d90132", "metadata": {}, "source": [ "## 4. Questions d’Analyse Supplémentaires" ] }, { "cell_type": "markdown", "id": "678de325-1b25-4896-bdac-8d6213781d50", "metadata": {}, "source": [ "## 4.1 Répartition des types de carburant" ] }, { "cell_type": "code", "execution_count": null, "id": "0a26d67d-8615-4c12-9297-cef65bb58615", "metadata": {}, "outputs": [], "source": [ "ftype_values = df['fuelType'].value_counts()\n", "ftype_values" ] }, { "cell_type": "code", "execution_count": null, "id": "593f4fc3-5b46-4be7-a396-2211eec8f483", "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize=[3,5])\n", "fuel_values = df['fuelType'].value_counts()\n", "fuel_values\n", "fuel_values.plot(kind = 'pie', autopct = '%1.1f%%', figsize =[8,8])\n", "plt.title(\"distribution de données\")\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "4c59e4b9-e828-40db-a11c-35848d1c259f", "metadata": {}, "source": [ "### Oui, Il y a un type de carburant dominant, plus représenté que les autres : le benzin" ] }, { "cell_type": "markdown", "id": "4ac1af15-4691-47d5-86e1-e509f0cc0414", "metadata": {}, "source": [ "## 4.2 Prix moyen par marque et type de carburant" ] }, { "cell_type": "code", "execution_count": null, "id": "3cca8c12-c4c5-4342-bb30-719b7a5e3426", "metadata": {}, "outputs": [], "source": [ "avg_price = df.groupby(['brand', 'fuelType'])['price'].mean()\n", "print(avg_price)" ] }, { "cell_type": "markdown", "id": "0feb21a7-21df-4983-9084-377db48f5fd6", "metadata": {}, "source": [ "### Interprétation :\n", "- Permet de comparer les prix selon la marque et le type de carburant.\n" ] }, { "cell_type": "markdown", "id": "14608fc2-519a-443b-ab0f-7be4b9928688", "metadata": {}, "source": [ "## 4.3 Les 10 voitures les plus chers " ] }, { "cell_type": "code", "execution_count": null, "id": "30a85f58-9c8d-496d-b8c4-79b5c4daa7a6", "metadata": {}, "outputs": [], "source": [ "top_10_veh = df.nlargest(10, 'price')\n", "print(top_10_veh[['brand', 'price', 'yearOfRegistration', 'fuelType']])" ] }, { "cell_type": "markdown", "id": "4f01b0da-f2a7-4a0a-b99a-72b0fc015c21", "metadata": {}, "source": [ "### Interprétation :\n", "•\tPermet d’observer les caractéristiques communes des véhicules très chers.\n" ] }, { "cell_type": "markdown", "id": "8ec5cbf7-8238-446d-b1e3-eb919c4b115b", "metadata": {}, "source": [ "## 5. Conclusion\n", "\n", "1.\tLe véhicule Volkswagen est le plus fréquents.\n", "2.\tLa distribution des prix est asymétrique, avec de nombreuses voitures bon marché.\n", "3.\tUne relation inverse est observée entre le kilométrage et le prix.\n", "4.\tLes voitures récentes ont tendance à être plus chères.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "674e4f08-fae7-46b8-93a6-2127e86c33db", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.2" } }, "nbformat": 4, "nbformat_minor": 5 }