Colon Cancer: Intelligent Modeling of Risk Factors

Several factors are involved in the development of colon cancer. These factors are characterized by complexity and uncertainty. The system is very complex to analyze using classical mathematical methods. Statistical analysis techniques are used. Due to the complexity of the system, these studies remain in the domain of the probable and the uncertain. A technique based on the principles of artificial intelligence is proposed. The principles of fuzzy inference are applied to the analysis of these factors. The proposed system makes it possible to take care of these incertitude’s inherent to the input variables. The result will be as accurate as possible. Just randomly set the input variables to instantly read the result at the output. This can be a tool to predict and prevent the occurrence of this type of cancer.


Introduction
Several factors may be considered as risk factors for colon cancer.These factors include age.The incidence of colon cancer is increasing in young adults and adults over 50 years of age.The American Association against Cancer recommend colorectal cancer screening below age 45, although the high rate increases with age [1].Previous history of colorectal polyps can also constitute a risk factor.Studies have shown that the early diagnosis of colorectal cancers allows the identification of colonic polyps.Some patients will have an increased risk of developing this type of cancer.In general, the risk of colorectal cancer increases with age in both sexes.This type of cancer is recorded in half of diagnosed cases whose age exceeds 70 years whereas this rate is 10% among those fewer than 50 years [2].
Other risk factor is the inflammatory bowel disease.Crohn's disease and ulcerative colitis may constitute a risk of colorectal cancer.This risk increases with time and the anatomical extent of colitis as well as the degree of inflammation.When colitis settles over time, it can be an aggravating factor in the development of colorectal cancer [3].There is no doubt that the risk of colorectal cancer is directly associated with family ties, it is also demonstrated in families of patients with colorectal cancers Several factors may be the cause of colorectal cancer, however, demographic factors (implying genetic factors) and dietary factors largely explain mortality by this cancer, although this remains poorly explained [4], and family history or a personal history of ovarian, endometrial or breast cancer.
Studies have shown that family history is directly associated with the risk of colon cancer [5].Other studies have analyzed the relationship between colon cancer in a person and the incidence in parents and siblings [6].Other factors may constitute risk factors.The weight of the effect of each factor is often misunderstood.The physiological and metabolic system is very complex and varies from one person to another.The modelling of such risk factors is very difficult using classical mathematical techniques.Due to the complex nature of these factors, this study proposes the application of the principles of fuzzy inference.Fuzzy logic deals with uncertainty and where data are missing or incomplete [7].The proposed system considers some factors measured during the period 2006 to 2014 by the National Cancer Registry of Setif in Algeria.The result will be the prediction of the appearance of this type of cancer from the values at the input of the system.As the analysis takes into account the uncertainties associated with the input variables and the inaccuracy due to the factors ignored and that have their effect, the output result will be as accurate as possible.

Materials and Methods
Recent techniques in computer science and artificial intelligence tend to imitate human reasoning.This is the foundation of advanced modelling [8][9].Among these techniques, we use of the theory of fuzzy sets.The advantage of fuzzy logic is that it easily solves non-linear systems.Human expertise is translated into inference rules linking the input variables to the output variable.In recent times, the use of fuzzy analysis systems has become widespread in various fields, notably in the medical field [10].In our case, this type of cancer is increase in people aged 45 years and older.This can be explained by the early detection and adoption of western lifestyle like physical inactivity and the richness of foods in fat [11].This does not preclude other factors that have their effects and are ignored.This amply justifies the use of this analysis technique.

Diagram of designed fuzzy system
The proposed system consists of three input variables, a module of fuzzyfication, a database, a defuzzyfier module and an output Figure 1. The variable "Sex" expressed in numeric term is not fuzzyfied.We assign a value (1) to the male sex and the value (2) to the female sex.
 The variable "Period" expressed in numeric term must be fuzzyfied.For this, three triangular membership functions are created.Fuzzy intervals between two neighboring functions allow the uncertainties associated with this representation to be taken care of.The functions represents the years between 2006 and 2014.

Output data:
The variable "Incidence" expressed in numeric term must be fuzzyfied into linguistic term.For this, three triangular membership functions are created.Fuzzy intervals between two neighbouring functions allow the uncertainties associated with this representation to be taken care of.We assign a value (0-40) to the low incidence, (30-70) to the average incidence and (60-100) to high incidence with reference to the recorded incidence values Table 1.

Fuzzy rules database
In general, the expression of a rule is of the form (IF ... THEN)

IF (Antecedents)…THAN (Consequence).
By referring to registered values, we assign the set of rules in this form. Example: IF the "Age" is young AND the "Gender" is female AND the "Year" is 2010, THEN the "Incidence" is low.
The rule base must contain all possible combinations.

Inference module
When the basis of the rules is established by reference to the measured values and the intervention of the human expert, this has made it possible to establish an intelligent application in which it is possible to predict the incidence of colon cancer.The inference of the system makes all these rules co-operate and make the correspondence between them and the output result as a consequence.The result obtained takes into account even the fuzzy intervals created between the neighboring membership functions.

Results
Many studies have been devoted to analyzing the data.The tools used are often classical mathematical tools such as differential equations or statistical analyzes.The finding is that it is very difficult to encompass all the variables in a mathematical equation which makes these tools very heavy and sometimes impossible to solve.The statistical tools are still in the probable with uncertainties.This study circumvents these difficulties by using an artificial intelligence tool with fuzzy logic.The variables involved in the process are considered uncertain and expressed in linguistic terms.By this, the uncertainties are compensated.Each variable is represented by a triangular membership function over an interval ranging levels.Linguistic variables are assigned to intervals.Note that neighboring intervals overlap in fuzzy intervals.By this, uncertainties are taken care of.The output variable is linked to the input variables to consider all possible combinations.Once the system is established, it gives the possibility to introduce random variables at the input to instantly read the result at the output.This result comes from the collaboration of all the rules that link the inputs to the output.

Figure 1 :
Figure 1: Structure of designed fuzzy system with three inputs, analysis module and output.

Figure 2 :
Figure 2: Example application random setting of the inputs and direct reading of the result at the output.