Data-Driven Assessment of Soil Heavy Metal Contamination in Joinkrama, Rivers State, Nigeria Using Pollution Indices and Multivariate Analytics
Okes Imoni *
Department of Public Health, University of Sunderland, London, United Kingdom.
Erebi Lisa Jonathan
Department of Geology, Niger Delta University, Wilberforce Island, Bayelsa State, Nigeria.
*Author to whom correspondence should be addressed.
Abstract
This study leverages data science methodologies to quantitatively assess soil contamination in Joinkrama, Ahoada West LGA, Rivers State, Nigeria. Employing a structured geospatial and statistical pipeline, soil samples were collected across stratified depths (0.5–1 meter) from three sites (S1, S2, S3) and a control point (C1). Heavy metal concentrations Cadmium (Cd), Lead (Pb), Nickel (Ni), Chromium (Cr), Copper (Cu), and Zinc (Zn) were measured using Atomic Absorption Spectroscopy (AAS) and Inductively Coupled Plasma Mass Spectrometry (ICP-MS). Data preprocessing included normalization and outlier detection using IQR thresholds. Exploratory Data Analysis (EDA) revealed spatial variability with elevated mean concentrations of Cd (0.03 ± 0.01 mg/kg), Pb (0.10 ± 0.04 mg/kg), and Zn (0.16 ± 0.07 mg/kg), notably in S1 and S3. To quantify contamination levels, pollution indices such as Contamination Factor (CF), Degree of Contamination (Cdeg), Pollution Load Index (PLI), Geo-accumulation Index (Igeo), and Potential Ecological Risk Index (PERI) were computed programmatically, revealing high contamination clusters with PERI > 150 in hotspot zones. Dimensionality reduction via Principal Component Analysis (PCA) identified Cd, Pb, and Zn as key anthropogenic signatures, corroborated by Hierarchical Cluster Analysis (HCA) using Ward’s method and Euclidean distance metrics. These contaminants were spatially correlated with known industrial activities, including gas flaring and agrochemical deposition. The integration of environmental data science techniques underscores the utility of computational pollution modeling for environmental risk profiling and decision support. This work presents a replicable, scalable analytical framework for soil quality surveillance and spatial prioritization of remediation in oil-impacted regions of the Niger Delta.
Keywords: Soil contamination, heavy metals, data science, PCA, pollution indices, cluster analysis, environmental modeling, python, spatial analytics