This article explained the measures of frequency and association that are used in observational epidemiological data analysis. The observational studies include cohort, casecontrol and crosssectional. In epidemiology, most of the variables are nominal with only two categories like exposed or unexposed, male or female, case or control, so ratios, rates, and proportions are used in the analysis of these types of dichotomous variables. Different fictional data from different studies were used to calculate the incidence rate, relative risk, mortality rate, odds ratio and prevalence of diseases. The odds ratio and relative risk are called measures of association simply because they quantify the relationship between exposure and outcome. Incidence rate, relative risk, and mortality rate were calculated in cohort studies; the odds ratio was determined in a casecontrol study while prevalence was calculated in a crosssectional study. The appropriate measure to be used depends on the type of the research.
Keywords: Frequency measure; Observational studies; Human Immuno Virus; Epidemiology
People are suffering from some diseases like cancer, HIV/AIDS, diabetes, hypertension, heart disease, malaria, sickle cell anemia, among others. Polio is also one of the diseases suffered by children in some developing countries. A lot of healthrelated problems have been bedeviling people all over the world. How to measure the diseases, determining their causes and plan the appropriate means of controlling the diseases as well as their occurrences are very important issues address by ‘Epidemiology’. Measures of frequency and association are very useful for that purpose and they are regarded as the fundamental of descriptive epidemiology.
Epidemiology is defined by [1] as “the study of the distribution and determinant of healthrelated events in a specified population and the impact of this study to control of health related problems”. Any variable or factor that can affect the frequency of the occurrence of disease in a population is referred to as ‘determinant’ [2].
Epidemiology is very important field that is uses by government, health organizations, among others, in determining the important aspects of human conditions in a particular population. Such aspects include nationality, morbidity and mortality and they are described by rates, ratios and proportions. The main concern of epidemiology is to measure health, discover what bring about the disease and intervene to cure the disease and overcome its causes [3]. The role of epidemiology is beyond just a disease but the improvement of health, the control of the disease and devising structure for the healthrelated problems analysis. Epidemiology is made up of two study designs.
Experimental study design and observational study design are the two basic study designs in epidemiology. In experimental studies, intervention is made by a researcher to modify reality and then observe what will happen, while in observational studies, a researcher notices what occurs but does not make any modification [4]. Randomized controlled trials and Quasiexperimental design are the types of experimental study designs while cohort study, casecontrol study as well as crosssectional study are the three most common types of observational study designs.
Data analysis is very crucial in epidemiological research as it assists in forming and structuring the findings from different sources of data collection and it also helps to keep human bias away from conclusion with the aid of appropriate statistical treatment [5]. Since some of the variables typically used in observational studies are dichotomous then the measures of frequency and association are used in making the analysis of data to determine the occurrence of disease and/or to measure the association/relationship between exposure and outcome.
This article focuses on some measures of frequency and association calculated for cohort study, casecontrol study and crosssectional study.
In this type of studies, a researcher observes and systematically gathers relevant information, but does not attempt to modify the subjects being observed. Unlike experimental studies where a researcher intervenes to alter something (e.g., gives a drug to treatment group) and then observes what will occur, no intervention is made by the researcher in an observational study. Examples of observational studies include a survey of smoking habits among adolescents, the study of breast cancer among women aged between 25 and 60, and a study of maladaptive behaviors among high school students.
Observational studies are carriedout when a researcher cannot perform an experiment, when the experiment is not accepted or when the study is not experimental in nature. It is also carriedout when the primary aim of the researcher is to get descriptive information. Cohort study, casecontrol study, and crosssectional study are the three most common observational studies (Figure 1).
Figure 1: Crosssectional studies are the three most common observational studies [8].
The ratios, proportions, and rates are used in epidemiology to describe the birth, disease and death. The birth rate, mortality rate and the prevalence or incidence rate of a disease can be calculated using the data derived from the observational studies.
It is vital to consider the concept of ‘confidence interval’ because of error of random sampling in observational studies and the outcome achieved may differ from the reality, because of chance. Confidence interval will be calculated to assess or evaluate the possible impact of this sampling error. The most commonly used confidence intervals in healthrelated research are 95% intervals. For Relative Risk (RR), the null value or ‘noeffect’ is 1.0. 1.0 RR indicates that the two groups being compared do not differ. If both ends of the confidence interval are less than 1.0, then it indicates an inverse relationship between exposure and outcome; similarly a positive relationship exists if both ends of the CI are greater than 1.0. However, if the CI includes the null value, i.e. the upper limit is greater than 1.0 and the lower limit is less than 1.0, then a researcher may not disclose the likelihood that the real RR is 1.0, and thus the relationship do not exist between exposure and outcome [3].
Cohort study: incidence rate, relative risk and mortality rate
Data from cohort study can be evaluated and/or analyzed using incidence rate, relative risk and mortality rate. Mortality rate is regarded as a descriptive frequency measure while incidence rate and relative risk as measures of comparative effect [6]. Cohort study analysis used the ratio of the rate of disease in the exposed group compared with the rate in the unexposed group.
Incidence rate
In epidemiology, incidence simply means the occurrence of new cases of disease, for example, new cases of Ebola disease, Lassa fever, or injury in a population during a specified period. The incidence of a particular disease measures how quickly or frequently the disease of interest is been developed by people. Unlike prevalence, incidence considers only new cases, and it has a unit. In order to measure the incidenceof a disease, a cohort study should be conducted. The study will include participants who are at risk of developing the disease of interest. Then they should be followed to determine those that truly developed the disease. Incidence rate is one of the approaches of measuring the frequency of disease in a population. Therefore, the incidence rate of a disease measures the frequency of the disease occurrence in a population over a specified period. Incidence rates are subject to change over time, from disease to health, therefore the period of the cohort need to be specified.
$Incidence\text{}rate=\frac{Number\text{}of\text{}new\text{}cases\text{}during\text{}a\text{}given\text{}time\text{}period}{Total\text{}number\text{}of\text{}people\text{}in\text{}the\text{}population}x{10}^{n}$
Example
The cholera new cases among the Yobe state of Nigerian civilians population is 545 while the Nigerian civilian population was estimated to be 828,262. The cholera incidence rate for the Nigerian civilian population will be calculated using these data.
$Incidence\text{}rate=\frac{545}{828,262}x{10}^{5}$
=0.000658 × 100,000 = 65.8 per 100,000
The above example shows that 545 represent the new cases of diseases which were diagnosed during the specified period of the study while 828,262 is the population at risk. This implies that persons who are involved in the 828,262 should be able to develop the disease, which is been described during the period covered.
Relative risk (RR)
Relative risk also called risk ratio, is a measure of relationship which compares the rates of disease in two groups. The rate for the group of primary interest, for example, treatment group, is divided by the rate for a comparison group, for instance, control group. Relative measures are used to detect the frequency of the likelihood of experiencing a particular health outcome for a person who is exposed to something than a person who is not exposed. The measures give a clue about the strengthof relationship between the exposure and the outcome, but do not express anything about the definite number of occurrence of disease in either group.
$Incidence\text{}rate=\frac{\frac{Number\text{}of\text{}\mathrm{exp}osed\text{}deceased\text{}patients}{Number\text{}of\text{}\mathrm{exp}osed\text{}persons}}{\frac{Number\text{}of\text{}non\mathrm{exp}osed\text{}deceased\text{}patients}{Number\text{}of\text{}non\mathrm{exp}osed\text{}persons}}$
Example
2.2% was reported as the risk of lung cancer among smokers while 0.7% was the risk for nonsmokers in Jigawa state. The relative risk of lung cancer for the two groups of people (smokers versus nonsmokers) is calculated as:
$\mathrm{Re}lative\text{}risk\text{}\left(or\text{}risk\text{}ratio\right)=\text{}2.2\%/0.8\%\text{}=\text{}2.75$
The lung cancer risk in smokers is 2.75 of the risk of nonsmokers. In other words, the result shows that smokers are more likely to develop lung cancer than the nonsmokers.
Mortality rate
This is one of the frequency measures that measure the occurrences of deaths in a given population [7]. Defined mortality rate as a measure of the frequency of death occuring in a defined population during a specified time interval. There is a need to know the size of the population in which the deaths occur and the total number of deaths during a given period in order to calculate the mortality rate.
$Mortality\text{}rate\text{}=\frac{Number\text{}of\text{}deaths\text{}in\text{}aperiod}{Number\text{}of\text{}personsyears}x{10}^{n}$
Example
The following table will be used to calculate the mortality rate for maternal deaths in Kano state.
From the above table, the mortality rate for the entire population will be calculated as:
$\frac{Number\text{}of\text{}Maternal\text{}deaths}{Population}x{10}^{n}$
$=\frac{53}{1597}x{10}^{5}$
Therefore 3318.7 maternalrelated deaths were determined for the given population, and this is calculated per 100,000 population.
Casecontrol study: the odds ratio (OR)
Odds Ratio is a great measure of association used in a casecontrol study [8]. The odds ratio is a relative measure of risk used to determine the likelihood of developing the outcome for a person who is exposed to the factor as compared to that who is not exposed. It is used to evaluate the risk of a particular disease (or outcome) if certain factor (or exposure) is present. When events are rare, risk and the odds are very similar, and it is very easier to interpret relative risks than odds ratio. Thus in many situations, researchers will be able to interpret odds ratios by assuming or pretending that they are relative risks (Table 1).
Age Group 
Maternal Deaths 
Population 
1525 
21 
782 
26 – 35 
17 
540 
36 – 45 
14 
231 
≥ 46 
1 
44 
Total 
53 
1597 
Table 1: Mortality rate for maternal deaths in Kano state.
Casecontrol study results can be presented in a form of table (2×2) as follows (Table 2):
The odds ratio is calculated as:
$OR\text{}=\frac{\raisebox{1ex}{$a$}\!\left/ \!\raisebox{1ex}{$c$}\right.}{\raisebox{1ex}{$b$}\!\left/ \!\raisebox{1ex}{$d$}\right.}=\frac{ad}{bc}$
a = number of subjects with both exposure of interest and disease
b = number of subjects with exposure of interest, but without disease
c = number of subjects without exposure of interest, but with the disease
d = number of subjects without both exposures of interest and disease
a + c = total number of subjects with disease (cases)
b + d = total number of persons without disease (controls)
The OR is calculated as a comparative effect measure, and therefore, it is used to determine the strength of relationship that exists between exposure and outcome.

Cases 
Controls 
Total 
Exposed 
a 
b 
a + b 
Unexposed 
c 
d 
c + d 
Total 
a + c 
b + d 
a + b +c +d 
Table 2: Casecontrol study results.
Example
Table 3 shows the totals for females and males. The table will be used to determine the odds ratio.
$Odds\text{Ratio}=\frac{41\times 1,156}{1,240\times 16}=2.4$
Table 3 can also be used to calculate the risk ratio: to calculate the risk ratio of pellagra for females versus males, the risk of illness among females and also among males has to be calculated.
$Risk\text{}of\text{}illness\text{}among\text{}females\text{}=\frac{a}{a+b}=\frac{41}{1.281}=\text{}0.032$
$Risk\text{}of\text{}illness\text{}among\text{}females\text{}=\frac{c}{c+d}=\frac{16}{1.172}=\text{}0.014$
Therefore, the risks of illness among females and males are 0.032 or 3.2%, and 0.014 or 1.4% respectively. Females are the group of principal interest while males are the comparison group.
$Risk\text{}Ratio=\frac{3.2\%}{1.4\%}=\text{}0.014$

Yes 
No 
Total 
Female 
a = 41 
b = 1,240 
1281 
Male 
c = 16 
d = 1,156 
1172 
Table 3: Number of cases for lung cancer by sex.
The lung cancer risk in females is 2.3 times higher than that of males. The results indicated that the odds ratio of 2.4 and the risk ratio of 2.3 are close to each other. That is one of the interesting features of the odds ratio: when the outcome is not common, the odds ratio provides an appropriate approximation of the relative risk.
Crosssectional studies: prevalence
A crosssectional study has to be representative of the whole population. Therefore, appropriate probability sampling technique need to be used in determining the sample size which will represent the population. In random sampling, each element/participant has an equal chance of being participated in the study through the use of a procedure of random selection [9]. For instance, a study of the prevalence of hypertension among men aged 4070 years in Kano city should comprise a random sample of all men aged 4070 years in that city. Thus, hypertension male patients in Kano city who fall within the stated range (4070) have an equal probability of being participated in the research.
Prevalence is one of the three important measures that form the fundamental of descriptive epidemiology [1]. The prevalence of a disease is used to determine the proportion of a population that really has the disease of interest at a specific period. It is mainly the outcome measure obtained from a crosssectional study that measures the occurrence of existing disease. it is influenced by the incidence and the duration of the condition [4]. Prevalence has no unit.
$Risk\text{}Ratio=\frac{\text{Numberofpeoplewithcasesatagivenpointintime}}{Total\text{}number\text{}of\text{}people\text{}in\text{}the\text{}population}x{10}^{n}$
Example
120 of 360 patients interviewed as the reported use of a condom at least once during the three months before the interview in a survey of patients at a sexually transmitted disease clinic in Kano state.
The prevalence of condom use in this population over the last three months is calculated as:
$\frac{120}{360}x{10}^{2}=\text{}0.333\text{}=\text{}33.3\%$
Therefore, the prevalence of condom use during the given period study was 33.3% in this population of patients [10,11].