Dataset on significant risk factors for Type 1 Diabetes: A Bangladeshi perspective

In this article, dataset and detailed data analysis results of Type-1 Diabetes has been given. Now-a-days Type-1 Diabetes is an appalling disease in Bangladesh. Total 306 person data (Case group- 152 and Control Group- 154) has been collected from Dhaka based on a specific questioner. The questioner includes 22 factors which were extracted by research studies. The association and significance level of factors has been elicited by using Data mining and Statistical Approach and shown in the Tables of this article. Moreover, parametric probability along with decision tree has been formed to show the effectiveness of the data was provided. The data can be used for future work like risk prediction and specific functioning on Type-1 Diabetes.


a b s t r a c t
In this article, dataset and detailed data analysis results of Type-1 Diabetes has been given. Now-a-days Type-1 Diabetes is an appalling disease in Bangladesh. Total 306 person data (Case group-152 and Control Group-154) has been collected from Dhaka based on a specific questioner. The questioner includes 22 factors which were extracted by research studies. The association and significance level of factors has been elicited by using Data mining and Statistical Approach and shown in the Tables of this article. Moreover, parametric probability along with decision tree has been formed to show the effectiveness of the data was provided. The data can be used for future work like risk prediction and specific functioning on Type-1 Diabetes.
& Provided data can be used in not only significance analysis but also in risk prediction functioning. These data introduced new approach of risk factor prediction and finding the significance level among factors as well as sub factors.
Analyzed Dataset of both Data Mining and Statistical approach illustrates the comparison effect and realistic outcome of the research.

Data
Data provided in this article based on different factors among Type-1 Diabetes. Table 1, Table 2  Table 3 and Table 4 shows the significance level of Factors according to Info Gain, Gain Ratio, Gini Index and Chi-square (χ 2 )-Test. Table 1 illustrates the significance among the factors according to the analysis whereas Table 2, Table 3 and Table 4 also shows the significance level of sub factors like (Symptoms, Family history of Type-1 and Type-2 Diabetes). Table 5 shows the key factors on data analysis. Table 6 shows the Correlation among the significant factors which describes the dependency among the factors. P values and 95% C.I is shown in Table 7 which shows the significant factors. The factors whose P value is 4 0.05 is significant and is shown in the table. Table 8 depicts the probability of Type-1 Diabetes according to data. The probability are shown among the factors and sub factors which leads to conclude effectiveness of those sub factors in Type-1 Diabetes.

Methodology of data analysis
Type 1 Diabetes is now a concerning factor that is increasing at an alarming rate in low incoming country like Bangladesh. The increase in Blood glucose level (Hypoglycemia) causes Type-1 Diabetes in childhood [1]. Work on dataset of Type-1 Diabetes [2] in different regions of the world has been    [3]. In this paper, dataset on Type-1 Diabetes has been provided for Low incoming country like Bangladesh.

Data collection and preprocessing
Data of Type-1 Diabetes was collected from Different Hospitals and Diagnostic center from Dhaka, Bangladesh. The Data collection process was done by following a questioner. The questioners have been formed by previous research studies and discussion with medical persons. Both Case (Affected) and Control (Unaffected) group data was collected for both male and female. The total data size is 306 where 152 was affected (Case) and 154 was unaffected (control) groups. The total 22 Factors (like Age, Sex, Area of residence, Education of Mother, Hba1c, BMI) was considered in account to collect fruitful data.
After data collection there may be some inconsistent, missing and uncategorized data. Data preprocessing or so called data cleaning has been done using a Data preprocessing Feature of WEKA (A data Mining Tool). In previous studies [4] data is also preprocessed for future action.

Data mining approach
To find significant factors two Data mining tools Orange and WEKA was used. Probability of sub factors, χ 2 -Test, Info gain etc was done by Orange. WEKA was used for algorithm based analysis.
WEKA was also used to find correlation among the factors using Apriori Algorithm. By these procedures the significance level among the factors are explored on the Dataset.

Statistical approach
Statistical approach has been used to find significance and correlation in article [5]. We have used SPSS V20.0 to find out the P-Value and Confidence Interval. By P value the significant factors can easily be defined from the dataset. Table 6 Correlation data among factors using Apriori Algorithm.

Significance formulation
Factors like Hypoglycemia (increase glucose level) and Insulin are key factors for Type-1 Diabetes [6,7]. By all the data and Tables from the dataset the final decision tree can be formed. By the decision tree we can easily describe whether one person is affected or not.
Disease Risk prediction and its analysis on dataset for different disease has been done before by Ahmed et al. in [8]. Figs. 1-4 shows the detailed analysis results of data. The analysis was done using WEKA and Orange two different and powerful Algorithm based Data Mining Software. The outcome results and its data shows the risk factors and its significance to detect Type 1 Diabetes.

Financial support
There is no financial support for this research.