Research shows that depth and breadth of data is more impactful to machine learning model performance than cleverness of the algorithm. It is the computing equivalent of human experience.
This suggests that, when possible, you can improve predictive accuracy by expanding the dataset used to craft the predictive characteristics used in a machine learning model.
Think about it: There is a reason why physicians see thousands of patients during their training. It is this amount of experience, or learning, that allows them to accurately diagnose within their area of specialization. In fraud detection, a model will benefit from the experience gained by ingesting thousands of examples, consisting of both legitimate and fraudulent claims transactions. Superior fraud detection is achieved by analyzing an abundance of claims data to effectively understand behavior, and assess risk, at an individual level.
At XTND, we have performed extensive research on different modelling techniques. Across a variety of use cases, it clearly shows the volume and variety of training data is more critical in prediction than the type of algorithm used. This research, and similar independent research throughout the AI community, indicates that fraud models that are developed and trained using data from various insurance providers, will be more accurate than models that rely on a relatively thin dataset.