Predictive ML Experiments
Exploring machine learning techniques in challenging prediction domains to enhance understanding and practical skills.
The Challenge
Horse racing and forex markets represent some of the most challenging domains for predictive modeling. These environments are characterized by high noise, numerous variables, and outcomes that financial institutions and professional handicappers struggle to predict consistently.
Rather than avoiding these difficult domains, I chose them as learning laboratories to push my understanding of feature engineering, model selection, and performance evaluation in real-world scenarios where even modest improvements have significant value.
The Approach
Built two distinct applications that demonstrate different aspects of predictive ML: a horse racing predictor that processes daily race cards and historical performance data, and a forex signal generator that analyzes EUR/USD price movements for trading opportunities.
Project Details
Personal R&D
Predictive ML Learning
Ongoing Experiments
Two Experimental Applications
Horse Racing Predictor
Daily race analysis and winner prediction
Data Pipeline
Parses daily race cards, historical performance data, jockey statistics, track conditions, and betting odds to create comprehensive feature sets for each runner.
Model Approach
Trains fresh models daily using recent historical data, focusing on predicting top-3 finishers rather than outright winners to improve accuracy in this high-variance domain.
EUR/USD Trading Signals
Currency pair prediction model
Feature Engineering
Incorporates technical indicators, price momentum, volatility measures, and time-series patterns to identify profitable entry and exit points.
Signal Generation
Focuses on binary buy/sell signals with confidence scoring, emphasizing risk management and position sizing over frequency of trades.
Technical Learning
Feature Engineering in Noisy Domains
Both applications required careful feature selection and engineering to extract signal from noise. Horse racing demanded understanding of racing dynamics, while forex required technical analysis integration.
Learned the importance of domain expertise in feature creation and the challenges of time-series prediction in non-stationary environments.
Model Performance in Real-World Conditions
Traditional accuracy metrics proved insufficient for these domains. Developed custom evaluation approaches focusing on practical utility—top-3 predictions for racing and risk-adjusted returns for forex.
Gained deep appreciation for the gap between laboratory performance and real-world application, especially in high-stakes prediction scenarios.
Data Pipeline Architecture
Built robust data collection and preprocessing pipelines capable of handling inconsistent data sources, missing values, and real-time updates. These systems taught valuable lessons about production ML infrastructure and data quality management.
Key Insights
These experiments demonstrate that even in notoriously difficult prediction domains, thoughtful feature engineering and domain understanding can yield models that outperform random chance by meaningful margins. The real value lies not in perfect predictions but in learning to extract actionable insights from complex, noisy data.
"Satisfies ML curiosity by tackling domains where even modest accuracy improvements have significant value—shows willingness to learn through challenging real-world applications."