This project focuses on identifying potential voting irregularities and ensuring the transparency of the election results in Plateau State. As a resident and voter in Plateau State, I chose this region to leverage my familiarity with the area and provide a more informed analysis. This report documents the methodology, findings, and key insights from the outlier detection analysis conducted on the election data.
Data Collection:
Adding Geospatial Data:
Data Verification and Cleaning:
Final Dataset Preparation:
The goal of identifying neighboring polling units is to determine which units are geographically close to each other. This allows us to compare voting patterns and detect any significant deviations that might indicate irregularities or influences.
Geospatial analysis involves calculating the distances between polling units to identify neighbors. For this, I used the Haversine formula, which is suitable for calculating distances between points on the Earth's surface.
A radius of 1 km was chosen to define neighboring polling units. This radius is large enough to capture relevant neighboring units without being too broad, which could include unrelated units. The 1 km radius is a reasonable distance in both urban and rural areas for comparing voting patterns.
BallTree algorithm from the scikit-learn library was employed to efficiently find all neighbouring polling units within the 1 km radius. BallTree is well-suited for this purpose as it quickly searches for neighbours in large datasets.
Comparing the votes of a polling unit with those of its neighbours allows for the detection of anomalies. A significant deviation suggests that the polling unit's results are inconsistent with the local voting pattern, indicating potential irregularities.