The problem scenario of this lab is from our optional textbook, Fundamentals of Machine Learning for Predictive Data Analytics.

A credit card issuer has built two different credit scoring models that predict the propensity of customers to default on their loans. The output scores of the two models for a test dataset are shown in the table below (it should be obvious that the lower the score the better) and in CSV format:
IDTargetModel 1 ScoreModel 2 Score
1bad 0.6340.230
2bad 0.7820.859
3good0.4640.154
4bad 0.5930.325
5bad 0.8270.952
6bad 0.8150.900
7bad 0.8550.501
8good0.5000.650
9bad 0.6000.940
10bad 0.8030.806
11bad 0.9760.507
12good0.5040.251
13good0.3030.597
14good0.3910.376
15good0.2380.285
16good0.0720.421
17bad 0.5670.842
18bad 0.7380.891
19bad 0.3250.480
20bad 0.8630.340
21bad 0.6250.962
22good0.1190.238
23bad 0.9950.362
24bad 0.9580.848
25bad 0.7260.915
26good0.1170.096
27good0.2950.319
28good0.0640.740
29good0.1410.211
30good0.6700.152

The lab task is to write a function to calculate the area under the ROC curve of ONE model using approximate integration.

To calculate the ROC area, your program should read in one model's prediction result at a time, and the data can be pre-processed to be sorted according to the model score. For example, to calculate the ROC area for Model 1, the data feed into your program should look like (and in CSV format):
TargetModel Score
good0.064
good0.072
good0.117
good0.119
good0.141
good0.238
good0.295
good0.303
bad0.325
good0.391
good0.464
good0.5
good0.504
bad0.567
bad0.593
bad0.6
bad0.625
bad0.634
good0.67
bad0.726
bad0.738
bad0.782
bad0.803
bad0.815
bad0.827
bad0.855
bad0.863
bad0.958
bad0.976
bad0.995

Then, your program should use each score as the threshold value to calculate the FPR (false positive rate) and TPR (true positive rate). Each such pair gives a point in the ROC curve.

Finally, calculate the area under the ROC curve using the (simplified) approximate integration based on these points.

(Optional), use your program to calculate the ROC area for model 1 and model 2 in the above given credit card prediction example, and determine which model is a better one.