1 Summary

I use Python on a Google Compute virtual machine to analyze anonymized credit card transactions and successfully detect fraud with an F1 score of 85.68 and an accuracy of 99.96% (compared to a baseline accuracy of 99.83% when predicting that no transactions are fraudulent). The dataset is from Kaggle and consists of 284,807 European credit card transactions (492 of which are fraudulent) spanning the course of two days. The data is anonymized so the columns consist of 28 principal components, the dollar amount of the transaction, and the class of the transaction.

To follow along, download my python code from github and the dataset from Kaggle

2 Google Compute Setup

Since the dataset is so large and there are so few positive classes, it is slow to train, especially if you want to use grid search to optimize hyperparameters. I was frustrated with the computation speed of my laptop so I looked into cloud computing platforms that offered a free trial. I tried Amazon Web Services and Microsoft Azure, but found that their free trials were extremely limited. I think each only offered up to 8 virtual cpu cores. I settled on Google Compute, which offers $300 of free computing time and allowed me to upgrade my quota to 96 virtual cpu cores.

Once you have created your Google Compute account, create a new virtual machine instance. I used 96 cores and the grid search took around 6 hours to run so you may want to reduce the number of parameters if you choose to use fewer cores. Connect to your virtual machine using PuTTY and run the following code to setup Python 3 and XGBoost:

sudo apt update
sudo apt -y install python3 python3-dev

wget https://bootstrap.pypa.io/get-pip.py
sudo python3 get-pip.py

sudo apt -y install git
sudo apt-get -y install g++ gfortran
sudo apt-get install make

pip install numpy --user
pip install pandas --user
pip install scikit-learn --user
pip install prettytable --user
pip install scipy --user


git clone --recursive https://github.com/dmlc/xgboost
cd xgboost
cp make/minimum.mk ./config.mk
make -j96 #replace 96 with the number of cores
cd python-package
python3 setup.py develop --user

pip install numpy==1.13.3 --user

Now that your Python environment is setup on your virtual machine, use FileZilla to transfer the Kaggle dataset and your script to the virtual machine. Type ‘python3 script.py’ to run the script. To save the output to a file, type ‘python3 script.py > output.txt.’

3 Script Overview

Here is a brief overview of my Python script. I plan to add more detail at a later date as well as additional code comments to make the code more understandable.

I use nested cross-validation with 3 folds (due to computing constraints) to optimize the hyperparameters of Logistic Regression, Random Forest, Naive Bayes, K-Nearest Neighbors, and XGBoost and then test the optimized models to see which performs best to get its generalization F1 score. Nested cross-validation is necessary when optimizing hyperparameters so that you do not test the optimized models on their training data.

Note: Some of the nested cross-validation is shamelessy borrowed from StackExchange.

4 Results

Below, I have copy and pasted the results of my code:

dilloncamp@instance-1:~/Data$ python3 modelselection4.py
Fitting 3 folds for each of 1080 candidates, totalling 3240 fits
[Parallel(n_jobs=96)]: Done   8 tasks      | elapsed:    5.0s
[Parallel(n_jobs=96)]: Done 258 tasks      | elapsed:  1.3min
[Parallel(n_jobs=96)]: Done 608 tasks      | elapsed:  4.2min
[Parallel(n_jobs=96)]: Done 1058 tasks      | elapsed:  9.6min
[Parallel(n_jobs=96)]: Done 1608 tasks      | elapsed: 17.3min
[Parallel(n_jobs=96)]: Done 2258 tasks      | elapsed: 25.8min
[Parallel(n_jobs=96)]: Done 3008 tasks      | elapsed: 36.1min
[Parallel(n_jobs=96)]: Done 3240 out of 3240 | elapsed: 42.9min finished
Fitting 3 folds for each of 1080 candidates, totalling 3240 fits
[Parallel(n_jobs=96)]: Done   8 tasks      | elapsed:    4.9s
[Parallel(n_jobs=96)]: Done 258 tasks      | elapsed:  1.3min
[Parallel(n_jobs=96)]: Done 608 tasks      | elapsed:  4.2min
[Parallel(n_jobs=96)]: Done 1058 tasks      | elapsed:  9.6min
[Parallel(n_jobs=96)]: Done 1608 tasks      | elapsed: 17.1min
[Parallel(n_jobs=96)]: Done 2258 tasks      | elapsed: 25.5min
[Parallel(n_jobs=96)]: Done 3008 tasks      | elapsed: 35.6min
[Parallel(n_jobs=96)]: Done 3240 out of 3240 | elapsed: 42.2min finished
Fitting 3 folds for each of 1080 candidates, totalling 3240 fits
[Parallel(n_jobs=96)]: Done   8 tasks      | elapsed:    4.9s
[Parallel(n_jobs=96)]: Done 258 tasks      | elapsed:  1.3min
[Parallel(n_jobs=96)]: Done 608 tasks      | elapsed:  4.2min
[Parallel(n_jobs=96)]: Done 1058 tasks      | elapsed:  9.6min
[Parallel(n_jobs=96)]: Done 1608 tasks      | elapsed: 17.5min
[Parallel(n_jobs=96)]: Done 2258 tasks      | elapsed: 26.1min
[Parallel(n_jobs=96)]: Done 3008 tasks      | elapsed: 36.5min
[Parallel(n_jobs=96)]: Done 3240 out of 3240 | elapsed: 43.4min finished
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed: 133.2min finished
+-----+-------+-----+
|     |   0   |  1  |
+-----+-------+-----+
|  0  | 94763 |  9  |
|  1  |   33  | 131 |
+-----+-------+-----+
+-----+-------+-----+
|     |   0   |  1  |
+-----+-------+-----+
|  0  | 94769 |  3  |
|  1  |   38  | 126 |
+-----+-------+-----+
+-----+-------+-----+
|     |   0   |  1  |
+-----+-------+-----+
|  0  | 94764 |  7  |
|  1  |   38  | 126 |
+-----+-------+-----+
Fitting 3 folds for each of 1080 candidates, totalling 3240 fits
[Parallel(n_jobs=96)]: Done   8 tasks      | elapsed:    6.1s
[Parallel(n_jobs=96)]: Done 258 tasks      | elapsed:  1.8min
[Parallel(n_jobs=96)]: Done 608 tasks      | elapsed:  6.7min
[Parallel(n_jobs=96)]: Done 1058 tasks      | elapsed: 16.4min
[Parallel(n_jobs=96)]: Done 1608 tasks      | elapsed: 31.7min
[Parallel(n_jobs=96)]: Done 2258 tasks      | elapsed: 49.2min
[Parallel(n_jobs=96)]: Done 3008 tasks      | elapsed: 70.1min
[Parallel(n_jobs=96)]: Done 3240 out of 3240 | elapsed: 83.9min finished
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0,
       max_depth=10, min_child_weight=1, missing=None, n_estimators=150,
       n_jobs=96, nthread=None, objective='binary:logistic',
       random_state=7, reg_alpha=0.1, reg_lambda=0.01, scale_pos_weight=1,
       seed=None, silent=True, subsample=1)
Model: 6_XGB
F1 score in the 5 outer folds: [ 0.86184211  0.86006826  0.84848485].
Average F1 score: 0.8567984043778907

Fitting 3 folds for each of 20 candidates, totalling 60 fits
[Parallel(n_jobs=96)]: Done  52 out of  60 | elapsed:  1.1min remaining:    9.9s
[Parallel(n_jobs=96)]: Done  60 out of  60 | elapsed:  1.4min finished
Fitting 3 folds for each of 20 candidates, totalling 60 fits
[Parallel(n_jobs=96)]: Done  52 out of  60 | elapsed:  1.1min remaining:   10.1s
[Parallel(n_jobs=96)]: Done  60 out of  60 | elapsed:  1.4min finished
Fitting 3 folds for each of 20 candidates, totalling 60 fits
[Parallel(n_jobs=96)]: Done  52 out of  60 | elapsed:  1.2min remaining:   10.7s
[Parallel(n_jobs=96)]: Done  60 out of  60 | elapsed:  1.4min finished
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:  4.7min finished
+-----+-------+-----+
|     |   0   |  1  |
+-----+-------+-----+
|  0  | 94760 |  12 |
|  1  |   35  | 129 |
+-----+-------+-----+
+-----+-------+-----+
|     |   0   |  1  |
+-----+-------+-----+
|  0  | 94769 |  3  |
|  1  |   39  | 125 |
+-----+-------+-----+
+-----+-------+-----+
|     |   0   |  1  |
+-----+-------+-----+
|  0  | 94767 |  4  |
|  1  |   44  | 120 |
+-----+-------+-----+
Fitting 3 folds for each of 20 candidates, totalling 60 fits
[Parallel(n_jobs=96)]: Done  52 out of  60 | elapsed:  2.3min remaining:   21.2s
[Parallel(n_jobs=96)]: Done  60 out of  60 | elapsed:  2.5min finished
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features=10, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=50, n_jobs=96,
            oob_score=False, random_state=7, verbose=0, warm_start=False)
Model: 2_RF
F1 score in the 5 outer folds: [ 0.84590164  0.85616438  0.83333333].
Average F1 score: 0.8451331187464133

Fitting 3 folds for each of 8 candidates, totalling 24 fits
[Parallel(n_jobs=96)]: Done   8 out of  24 | elapsed:    4.3s remaining:    8.5s
[Parallel(n_jobs=96)]: Done  24 out of  24 | elapsed:   11.4s finished
Fitting 3 folds for each of 8 candidates, totalling 24 fits
[Parallel(n_jobs=96)]: Done   8 out of  24 | elapsed:    4.3s remaining:    8.7s
[Parallel(n_jobs=96)]: Done  24 out of  24 | elapsed:   11.7s finished
Fitting 3 folds for each of 8 candidates, totalling 24 fits
[Parallel(n_jobs=96)]: Done   8 out of  24 | elapsed:    4.3s remaining:    8.6s
[Parallel(n_jobs=96)]: Done  24 out of  24 | elapsed:   11.6s finished
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:   48.7s finished
+-----+-------+-----+
|     |   0   |  1  |
+-----+-------+-----+
|  0  | 94762 |  10 |
|  1  |   62  | 102 |
+-----+-------+-----+
+-----+-------+----+
|     |   0   | 1  |
+-----+-------+----+
|  0  | 94757 | 15 |
|  1  |   77  | 87 |
+-----+-------+----+
+-----+-------+-----+
|     |   0   |  1  |
+-----+-------+-----+
|  0  | 94754 |  17 |
|  1  |   56  | 108 |
+-----+-------+-----+
Fitting 3 folds for each of 8 candidates, totalling 24 fits
[Parallel(n_jobs=96)]: Done   8 out of  24 | elapsed:    4.8s remaining:    9.7s
[Parallel(n_jobs=96)]: Done  24 out of  24 | elapsed:   12.5s finished
LogisticRegression(C=30, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
Model: 1_Logit
F1 score in the 5 outer folds: [ 0.73913043  0.65413534  0.74740484].
Average F1 score: 0.7135568724730436

Fitting 3 folds for each of 9 candidates, totalling 27 fits
[Parallel(n_jobs=96)]: Done   4 out of  27 | elapsed:    2.0s remaining:   11.3s
[Parallel(n_jobs=96)]: Done  27 out of  27 | elapsed:    9.7s finished
Fitting 3 folds for each of 9 candidates, totalling 27 fits
[Parallel(n_jobs=96)]: Done   4 out of  27 | elapsed:    2.0s remaining:   11.2s
[Parallel(n_jobs=96)]: Done  27 out of  27 | elapsed:    9.7s finished
Fitting 3 folds for each of 9 candidates, totalling 27 fits
[Parallel(n_jobs=96)]: Done   4 out of  27 | elapsed:    2.0s remaining:   11.4s
[Parallel(n_jobs=96)]: Done  27 out of  27 | elapsed:    9.7s finished
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:   35.8s finished
+-----+-------+------+
|     |   0   |  1   |
+-----+-------+------+
|  0  | 92409 | 2363 |
|  1  |   19  | 145  |
+-----+-------+------+
+-----+-------+------+
|     |   0   |  1   |
+-----+-------+------+
|  0  | 92531 | 2241 |
|  1  |   29  | 135  |
+-----+-------+------+
+-----+-------+------+
|     |   0   |  1   |
+-----+-------+------+
|  0  | 92484 | 2287 |
|  1  |   28  | 136  |
+-----+-------+------+
Fitting 3 folds for each of 9 candidates, totalling 27 fits
[Parallel(n_jobs=96)]: Done   4 out of  27 | elapsed:    2.2s remaining:   12.4s
[Parallel(n_jobs=96)]: Done  27 out of  27 | elapsed:    9.9s finished
GaussianNB(priors=[0.9, 0.1])
Model: 4_NB
F1 score in the 5 outer folds: [ 0.10853293  0.10629921  0.10514109].
Average F1 score: 0.10665774559862497

Fitting 3 folds for each of 8 candidates, totalling 24 fits
[Parallel(n_jobs=96)]: Done   8 out of  24 | elapsed:  1.4min remaining:  2.7min
[Parallel(n_jobs=96)]: Done  24 out of  24 | elapsed:  4.6min finished
Fitting 3 folds for each of 8 candidates, totalling 24 fits
[Parallel(n_jobs=96)]: Done   8 out of  24 | elapsed:  1.6min remaining:  3.3min
[Parallel(n_jobs=96)]: Done  24 out of  24 | elapsed:  5.4min finished
Fitting 3 folds for each of 8 candidates, totalling 24 fits
[Parallel(n_jobs=96)]: Done   8 out of  24 | elapsed:  1.3min remaining:  2.6min
[Parallel(n_jobs=96)]: Done  24 out of  24 | elapsed:  4.3min finished
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed: 34.3min finished
+-----+-------+-----+
|     |   0   |  1  |
+-----+-------+-----+
|  0  | 94758 |  14 |
|  1  |   32  | 132 |
+-----+-------+-----+
+-----+-------+-----+
|     |   0   |  1  |
+-----+-------+-----+
|  0  | 94767 |  5  |
|  1  |   43  | 121 |
+-----+-------+-----+
+-----+-------+-----+
|     |   0   |  1  |
+-----+-------+-----+
|  0  | 94764 |  7  |
|  1  |   39  | 125 |
+-----+-------+-----+
Fitting 3 folds for each of 8 candidates, totalling 24 fits
[Parallel(n_jobs=96)]: Done   8 out of  24 | elapsed:  3.2min remaining:  6.4min
[Parallel(n_jobs=96)]: Done  24 out of  24 | elapsed:  7.4min finished
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=96, n_neighbors=4, p=2,
           weights='distance')
Model: 5_KNN
F1 score in the 5 outer folds: [ 0.8516129   0.83448276  0.84459459].
Average F1 score: 0.843563418813697

Average score across the outer folds:  {'2_RF': 0.84513311874641328, '6_XGB': 0.85679840437789068, '1_Logit': 0.71355687247304356, '4_NB': 0.10665774559862497, '5_KNN': 0.84356341881369701}

****************************************************************************************************
Now we choose the best model and refit on the whole dataset
****************************************************************************************************

Fitting 3 folds for each of 1080 candidates, totalling 3240 fits
[Parallel(n_jobs=96)]: Done   8 tasks      | elapsed:    6.1s
[Parallel(n_jobs=96)]: Done 258 tasks      | elapsed:  1.8min
[Parallel(n_jobs=96)]: Done 608 tasks      | elapsed:  6.7min
[Parallel(n_jobs=96)]: Done 1058 tasks      | elapsed: 16.3min
[Parallel(n_jobs=96)]: Done 1608 tasks      | elapsed: 31.4min
[Parallel(n_jobs=96)]: Done 2258 tasks      | elapsed: 48.8min
[Parallel(n_jobs=96)]: Done 3008 tasks      | elapsed: 69.8min
[Parallel(n_jobs=96)]: Done 3240 out of 3240 | elapsed: 83.3min finished
Best model:
        XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0,
       max_depth=3, min_child_weight=1, missing=None, n_estimators=100,
       n_jobs=96, nthread=None, objective='binary:logistic',
       random_state=7, reg_alpha=0, reg_lambda=1, scale_pos_weight=1,
       seed=None, silent=True, subsample=1)
Estimation of its generalization error (F1 score):
        0.8567984043778907
Best parameter choice for this model:
        {'max_depth': 10, 'n_estimators': 150, 'reg_lambda': 0.01, 'reg_alpha': 0.1}
(according to cross-validation `StratifiedKFold(n_splits=3, random_state=7, shuffle=True)` on the whole dataset).
Process time:  21579.125050783157

The XGBoost model has the best average F1 score on the outer folds: 85.68. It predicts about 77% of fraudulent charges and has an overall average accuracy of 99.96%.

5 Conclusion

Despite having such a limited dataset with only two days worth of transactions and 492 cases of fraud out of 284,807 transactions, my model was able to successfully classify 77% of credit card fraud instances. I am confident that with a weeks worth of data, I could build a much better model.