Dataset yang akan digunakan dalam proyek ini adalah Statlog German Credit Data dari [UCI Machine Learning Repository] (https://archive.ics.uci.edu/dataset/144/statlog+german+credit+data). Dataset ini berisi informasi tentang pinjaman dan risiko kredit di Jerman, dengan tujuan untuk memprediksi apakah seorang peminjam memiliki risiko kredit yang baik atau buruk. Dataset ini telah menjadi salah satu referensi standar untuk tugas klasifikasi di bidang keuangan dan manajemen risiko kredit.
Cara Import Data
Instalasi library ucimlrepo
# pip install ucimlrepo
Collecting ucimlrepo
Obtaining dependency information for ucimlrepo from https://files.pythonhosted.org/packages/3b/07/1252560194df2b4fad1cb3c46081b948331c63eb1bb0b97620d508d12a53/ucimlrepo-0.0.7-py3-none-any.whl.metadata
Downloading ucimlrepo-0.0.7-py3-none-any.whl.metadata (5.5 kB)
Requirement already satisfied: pandas>=1.0.0 in c:\users\derik\anaconda3\lib\site-packages (from ucimlrepo) (2.0.3)
Requirement already satisfied: certifi>=2020.12.5 in c:\users\derik\anaconda3\lib\site-packages (from ucimlrepo) (2023.11.17)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\derik\appdata\roaming\python\python311\site-packages (from pandas>=1.0.0->ucimlrepo) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in c:\users\derik\anaconda3\lib\site-packages (from pandas>=1.0.0->ucimlrepo) (2023.3.post1)
Requirement already satisfied: tzdata>=2022.1 in c:\users\derik\anaconda3\lib\site-packages (from pandas>=1.0.0->ucimlrepo) (2023.3)
Requirement already satisfied: numpy>=1.21.0 in c:\users\derik\anaconda3\lib\site-packages (from pandas>=1.0.0->ucimlrepo) (1.24.3)
Requirement already satisfied: six>=1.5 in c:\users\derik\appdata\roaming\python\python311\site-packages (from python-dateutil>=2.8.2->pandas>=1.0.0->ucimlrepo) (1.17.0)
Downloading ucimlrepo-0.0.7-py3-none-any.whl (8.0 kB)
Installing collected packages: ucimlrepo
Successfully installed ucimlrepo-0.0.7
Note: you may need to restart the kernel to use updated packages.
Import data
from ucimlrepo import fetch_ucirepo # fetch dataset statlog_german_credit_data = fetch_ucirepo(id=144) # data (as pandas dataframes) X = statlog_german_credit_data.data.features y = statlog_german_credit_data.data.targets
{'uci_id': 144, 'name': 'Statlog (German Credit Data)', 'repository_url': 'https://archive.ics.uci.edu/dataset/144/statlog+german+credit+data', 'data_url': 'https://archive.ics.uci.edu/static/public/144/data.csv', 'abstract': 'This dataset classifies people described by a set of attributes as good or bad credit risks. Comes in two formats (one all numeric). Also comes with a cost matrix', 'area': 'Social Science', 'tasks': ['Classification'], 'characteristics': ['Multivariate'], 'num_instances': 1000, 'num_features': 20, 'feature_types': ['Categorical', 'Integer'], 'demographics': ['Other', 'Marital Status', 'Age', 'Occupation'], 'target_col': ['class'], 'index_col': None, 'has_missing_values': 'no', 'missing_values_symbol': None, 'year_of_dataset_creation': 1994, 'last_updated': 'Thu Aug 10 2023', 'dataset_doi': '10.24432/C5NC77', 'creators': ['Hans Hofmann'], 'intro_paper': None, 'additional_info': {'summary': 'Two datasets are provided. the original dataset, in the form provided by Prof. Hofmann, contains categorical/symbolic attributes and is in the file "german.data". \r\n \r\nFor algorithms that need numerical attributes, Strathclyde University produced the file "german.data-numeric". This file has been edited and several indicator variables added to make it suitable for algorithms which cannot cope with categorical variables. Several attributes that are ordered categorical (such as attribute 17) have been coded as integer. This was the form used by StatLog.\r\n\r\nThis dataset requires use of a cost matrix (see below)\r\n\r\n ..... 1 2\r\n----------------------------\r\n 1 0 1\r\n-----------------------\r\n 2 5 0\r\n\r\n(1 = Good, 2 = Bad)\r\n\r\nThe rows represent the actual classification and the columns the predicted classification.\r\n\r\nIt is worse to class a customer as good when they are bad (5), than it is to class a customer as bad when they are good (1).\r\n', 'purpose': None, 'funded_by': None, 'instances_represent': None, 'recommended_data_splits': None, 'sensitive_data': None, 'preprocessing_description': None, 'variable_info': 'Attribute 1: (qualitative) \r\n Status of existing checking account\r\n A11 : ... < 0 DM\r\n\t A12 : 0 <= ... < 200 DM\r\n\t A13 : ... >= 200 DM / salary assignments for at least 1 year\r\n A14 : no checking account\r\n\r\nAttribute 2: (numerical)\r\n\t Duration in month\r\n\r\nAttribute 3: (qualitative)\r\n\t Credit history\r\n\t A30 : no credits taken/ all credits paid back duly\r\n A31 : all credits at this bank paid back duly\r\n\t A32 : existing credits paid back duly till now\r\n A33 : delay in paying off in the past\r\n\t A34 : critical account/ other credits existing (not at this bank)\r\n\r\nAttribute 4: (qualitative)\r\n\t Purpose\r\n\t A40 : car (new)\r\n\t A41 : car (used)\r\n\t A42 : furniture/equipment\r\n\t A43 : radio/television\r\n\t A44 : domestic appliances\r\n\t A45 : repairs\r\n\t A46 : education\r\n\t A47 : (vacation - does not exist?)\r\n\t A48 : retraining\r\n\t A49 : business\r\n\t A410 : others\r\n\r\nAttribute 5: (numerical)\r\n\t Credit amount\r\n\r\nAttibute 6: (qualitative)\r\n\t Savings account/bonds\r\n\t A61 : ... < 100 DM\r\n\t A62 : 100 <= ... < 500 DM\r\n\t A63 : 500 <= ... < 1000 DM\r\n\t A64 : .. >= 1000 DM\r\n A65 : unknown/ no savings account\r\n\r\nAttribute 7: (qualitative)\r\n\t Present employment since\r\n\t A71 : unemployed\r\n\t A72 : ... < 1 year\r\n\t A73 : 1 <= ... < 4 years \r\n\t A74 : 4 <= ... < 7 years\r\n\t A75 : .. >= 7 years\r\n\r\nAttribute 8: (numerical)\r\n\t Installment rate in percentage of disposable income\r\n\r\nAttribute 9: (qualitative)\r\n\t Personal status and sex\r\n\t A91 : male : divorced/separated\r\n\t A92 : female : divorced/separated/married\r\n A93 : male : single\r\n\t A94 : male : married/widowed\r\n\t A95 : female : single\r\n\r\nAttribute 10: (qualitative)\r\n\t Other debtors / guarantors\r\n\t A101 : none\r\n\t A102 : co-applicant\r\n\t A103 : guarantor\r\n\r\nAttribute 11: (numerical)\r\n\t Present residence since\r\n\r\nAttribute 12: (qualitative)\r\n\t Property\r\n\t A121 : real estate\r\n\t A122 : if not A121 : building society savings agreement/ life insurance\r\n A123 : if not A121/A122 : car or other, not in attribute 6\r\n\t A124 : unknown / no property\r\n\r\nAttribute 13: (numerical)\r\n\t Age in years\r\n\r\nAttribute 14: (qualitative)\r\n\t Other installment plans \r\n\t A141 : bank\r\n\t A142 : stores\r\n\t A143 : none\r\n\r\nAttribute 15: (qualitative)\r\n\t Housing\r\n\t A151 : rent\r\n\t A152 : own\r\n\t A153 : for free\r\n\r\nAttribute 16: (numerical)\r\n Number of existing credits at this bank\r\n\r\nAttribute 17: (qualitative)\r\n\t Job\r\n\t A171 : unemployed/ unskilled - non-resident\r\n\t A172 : unskilled - resident\r\n\t A173 : skilled employee / official\r\n\t A174 : management/ self-employed/\r\n\t\t highly qualified employee/ officer\r\n\r\nAttribute 18: (numerical)\r\n\t Number of people being liable to provide maintenance for\r\n\r\nAttribute 19: (qualitative)\r\n\t Telephone\r\n\t A191 : none\r\n\t A192 : yes, registered under the customers name\r\n\r\nAttribute 20: (qualitative)\r\n\t foreign worker\r\n\t A201 : yes\r\n\t A202 : no\r\n', 'citation': None}}
# export to pandas dataframeimport pandas as pddf = pd.DataFrame(X)df['target'] = ydf.to_csv('statlog_german_credit_data.csv', index=False)df
Attribute1
Attribute2
Attribute3
Attribute4
Attribute5
Attribute6
Attribute7
Attribute8
Attribute9
Attribute10
...
Attribute12
Attribute13
Attribute14
Attribute15
Attribute16
Attribute17
Attribute18
Attribute19
Attribute20
target
0
A11
6
A34
A43
1169
A65
A75
4
A93
A101
...
A121
67
A143
A152
2
A173
1
A192
A201
1
1
A12
48
A32
A43
5951
A61
A73
2
A92
A101
...
A121
22
A143
A152
1
A173
1
A191
A201
2
2
A14
12
A34
A46
2096
A61
A74
2
A93
A101
...
A121
49
A143
A152
1
A172
2
A191
A201
1
3
A11
42
A32
A42
7882
A61
A74
2
A93
A103
...
A122
45
A143
A153
1
A173
2
A191
A201
1
4
A11
24
A33
A40
4870
A61
A73
3
A93
A101
...
A124
53
A143
A153
2
A173
2
A191
A201
2
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
995
A14
12
A32
A42
1736
A61
A74
3
A92
A101
...
A121
31
A143
A152
1
A172
1
A191
A201
1
996
A11
30
A32
A41
3857
A61
A73
4
A91
A101
...
A122
40
A143
A152
1
A174
1
A192
A201
1
997
A14
12
A32
A43
804
A61
A75
4
A93
A101
...
A123
38
A143
A152
1
A173
1
A191
A201
1
998
A11
45
A32
A43
1845
A61
A73
4
A93
A101
...
A124
23
A143
A153
1
A173
1
A192
A201
2
999
A12
45
A34
A41
4576
A62
A71
3
A93
A101
...
A123
27
A143
A152
1
A173
1
A191
A201
1
1000 rows × 21 columns
Deskripsi Dataset
Dataset Input:
Dataset Statlog German Credit berisi 20 atribut yang menggambarkan karakteristik peminjam dan riwayat kredit mereka.
Atribut pada Dataset:
Status rekening yang ada (kategorikal: A11, A12, A13, A14)
A11: < 0 DM
A12: 0 - 200 DM
A13: > 200 DM
A14: tidak ada rekening giro
Durasi kredit dalam bulan (numerik)
Riwayat kredit (kategorikal: A30, A31, A32, A33, A34)