Statistics refers to a scientific and systematic methods of collecting, recording, summarizing, analyzing and representation of numerical data in precise manner.

Or

The study of methods of collecting, recording, summarizing, analyzing and presentation of data in precise manner by using numbers

Or

A science of observing, collecting, recording, summarizing, analyzing and presentation of data in precise manner by using numbers.

Numerical data understood as a body of information which given in numbers. Or Exact numerical facts or figures collected systematically and arranged for a certain purpose.

**NATURE OF DATA**

**Statistical data according to their varied nature**

Statistical data according to their varied nature include the following:-

**Discrete data**

It is a form of statistical data for variables whose values expressed or given in whole numbers. i.e. The data is for cases which do not exist in fractions.

For instance; the data for the number of people which can be given as 102 people who can not be divided into either decimal or fractions

**Continuous data**

The data for the variables whose values can be expressed in fraction or decimals. In this type of data, any value within the range can be given.

For instance; the data for temperature, rainfall, pressure, distance, growth rate, and other cases which also reflect the same. They are presented in continuity manner of fraction or decimals

**Individual data**

The set of data which provides specific value to every item in a sample given. For instance; Juma has weight of 47 kg. They consider every item as an important entity and singly presented

**Grouped data**

It is a form of data which gives values in range or classes. This type of data is of no precise as exact figures are quoted but values range in groups.

The classic example of the grouped data is that of population distribution by age and sex which may appear as follow:-

AGE |
FEMALES |
MALES |

0-9 | 14,897 | 14,567 |

10-19 | 15,432 | 14,329 |

20 – 29 | 17,987 | 13,098 |

30 – 39 | 16,876 | 17,654 |

**Statistical data according to scale of measurements**

This aspect is considerably on how the values of statistical data are given.

**The scale of measurement include the following.**

**Nominal data**

The type of data according to scale of measurement of which the values are given according to the name of items in a given sample. e.g. 10 apples, 5 oranges, 7 mangoes, 5 banana and 2 cherish.

**Ordinal data**

The data of which the values are given in an order of magnitude of observation in such a way the numbers indicate the rank order among objects. i.e. the values are commonly given in either ascending or descending order e.g. 91, 82, 79, 74, 68, 67, 58, 54 and 49.

**The interval data**

The data of which values are given in range at regular distance by being grouped. e.g. The data for population distribution by age and sex expressed in interval scale.

**Ratio data**

The data of which the values given show the number of times items of has relatively to another e.g. 1:3, 2:5, 3:7. e.t.c.

**VARIABLES**

Variable is an attribute that has values of which fluctuate under a given condition . For instance; production is a considerable variable as whose values change under conditions of policies lie; climate, technology, marketability and other which may make the same.

Variables are considerably varied and are classified into dependent and independent variables.

**Dependent variable**

Dependent variable is the one whose values fluctuate due to the force of another variable. i.e. the variable whose values change irregularly as controlled by another variable. For instance; production is one among the most pronounced variables as changes due to the force of other variables like climate, level of technology applied, demand of the products produced, and others which might cause it to change.

**CLASSIFICATION OF STATISTICS**

Statistics being the scientific and systematic methods dealing with numerical facts is broadly categorized into two depending on how data handled. The main broad categories include; descriptive and inferential statistics.

**Descriptive Statistics**

Descriptive statistics deal with recording, summarization, analyzing and presentation of numerical facts that have been actually collected. The actual collection of data can be like to population by conducting census.

**Inferential statistics**

Inferential statistics deal with recording, summarization, analyzing and presentation of numerical facts that have been handled by quantifying the uncertainties through prediction e.g. the likely harvest output in the next year or season.

**STATISTICAL DATA**

As already pointed out, statistical data are understood as the exact numerical facts or figures collected systematically and arranged for a certain purpose or body of information which is usually treated in numerical values.

Statistical data assessed being extremely varied and thus recognized be of different types. The categories of statistical data recognized with regards to their derived sources, varied nature and scale of measurements.

**Statistical data according to their varied sources**

Data by sources classified into two and include **primary** and secondary** data.**

**Primary data**

These are the numerical facts collected from the field or handled for the first time. i.e. They are the first hand or original information. The data are not available in the existing sources like books. Primary statistical data are handled by the techniques of interview, the use of questionnaires, observation, counting, measurements and other methods.

**Secondary data**

These are the numerical facts derived from the stored sources. The data were compiled by other people who carried out research. The sources of this type of data include; text books, reference books, magazines, maps, video tapes, audio tapes, and other sources which deliver the same.

**Independent variable**

Independent variable is the one whose values change on its own without being influenced by another variable. i.e. the variable whose values change steadily and regularly e.g. distance.

**SOURCES OF STATISTICAL DATA**

The sources of statistical data are simply the techniques employed to gather the numerical facts. These are broadly two and include; the numerical facts. These are broadly two and include; primary and secondary sources.

**Some of the primary techniques (sources) providing statistical data include the following:-**

- Interview method
- Questionnaire
- Scheduling
- Field observation method
- Literature review

**Interview method**

The technique of interview involves the collection of data through the asking of questions verbally by researcher to a respondent.

Or

Is a verbal interaction between an interviewer and interviewee designed to list the information, news, opinion and feelings they have on their own. Generally an interview is an oral organization of questions asked to respondents by a researcher.

**Questionnaire method**

Questionnaire is a set of research questions printed on a piece of paper then presented to respondents to replay the questions in writing. It is thus; questionnaire method is a way (means) of gathering statistical details done with the use of questionnaires given to the respondents to answer.

**Field observation method**

It is a method of gathering primary research data which done by a researcher looking over the phenomena. It is of two types and include; **participant** and **non participant observation**.

**Scheduling method**

This method of data collection is very much familiar to questionnaire. But it has little difference to questionnaire. The difference is that, schedule involves a prepared set of questions which are filled in by enumerators who are especially appointed for the purpose and of which carefully selected and trained enough to perform their job well. This method of data collection is very useful for carrying out population census. The secondary sources providing statistical data include

**Literature review method**

It is a systematic survey of the past documentary sources prepared by other researchers related to the study. The documentary sources include; text books, statistical obstruct census report, research articles, journals, news paper, and official reports.

Other methods for data collection include; measurements, counting and the carrying out of experiments.

Strengths of statistics application in Geography

**Application of statistics in geography offers the following vital significance**

Summarizes massive information by making more simple and thus, enable the geographers to handle large sets of data.

Statistics facilitate the process of data computation techniques possible in geography

Statistics make easy the process of data comparison. It is so; as it is impossible to make comparison without statistics of the variables to be compared.

Statistics application facilities the process of drawing relationship between the geographical variables like; climate and production, population and time; rainfall and temperature etc.

Application of statistics makes easy the process of data storage inform of numbers, tables, graphs, diagrams, and maps.

Application of statistics makes the geographical data be clearly understood and easy for being analyzed and interpreted.

Statistics enhance validity testing of the geographical models, theories, and concepts to the real world situations.

**STATISTICAL MEASURES**

Numerical values which make statistics are analyzed or examined to judge their implication (results) by taking into consideration of the statistical **measures**.

It is thus; statistical measures refer to the computed numerical values used to make data analysis as related to other values in a data set provided.

Statistical measures are numerous but with regards to their nature and roles, broadly divided into the following categories.

- Measures of central tendency
- Measures of variability

**MEASURES OF CENTRAL TENDENCY**

These are the measurements which show the central values and include; arithmetic mean, mode and median.

**ARITHMETIC MEAN**

Arithmetic mean is an average of all values in a set of distribution. It is determined by adding up all values and divided by the sum of observation added. Arithmetic mean is used to assess the distribution value weather was high or low.

**Computation of the arithmetic mean**

Computation of the arithmetic mean depends up on the nature of data given whether ungrouped or grouped.

For the ungrouped data set; arithmetic mean is computed by applying the following formula

Where by:

N = The total number of observation added.

**Example:**

Find the arithmetic mean for the following set of data.

5,7,10,12,13,14,15,7, and 2.

**Solution**

The arithmetic mean for the given set of data above is calculated as follow:

5+7+10+12+13+14+7+2=85

N = 9

**Thus: The Arithmetic mean = 9.4**

For the grouped data set; the arithmetic mean is calculated by the following application:

Where by;

X = Class mark

f = Frequency

**Example:**

Find the arithmetic mean for the following s cores of marks

Class Interval |
F |
X |
fx |

91-95 | 0 | 93 | 0 |

86-90 | 1 | 88 | 88 |

81-85 | 6 | 83 | 498 |

76-80 | 10 | 78 | 780 |

71-75 | 15 | 73 | 1095 |

66-70 | 34 | 68 | 2312 |

61-65 | 22 | 63 | 1386 |

56-60 | 10 | 58 | 580 |

51-55 | 2 | 53 | 106 |

Solution:-

According to the given data;

= 6845

= 100

Thus; the arithmetic mean = 68.45

**Advantages of the Arithmetic mean**

It is easy to calculate and the majority of people use to understand it

It is used to check the values if high or low

It can be used for further calculation. For instance; arithmetic mean is used to calculate standard deviation.

**Disadvantage of the arithmetic mean**

Arithmetic mean has a big weakness of being pulled towards an outlier (extreme scores).

It needs high mathematical knowledge to calculate arithmetic mean for the grouped data set.

**MODE**

Mode is a value number which occurs most frequently in a data set given

Or

Is the most commonly attained measurement value in a data set

Or

Is the measurement value that appears most in a particular variable among a sample of subjects.

Mode helps us to know concentration of values which can stimulate scientific investigation.

**Calculation of a mode**

Determination of a mode is depend much up on the nature of data set whether ungrouped or grouped.

For the ungrouped data set; mode is obtained by taking the number that appears most frequently or the one that has highest frequency than the rest

**Example;**

Determine the mode for the following data set.

2, 4, 2, 2, 5, 6, 4

Value |
Concentration |

2 | 3 |

4 | 2 |

5 | 1 |

6 | 1 |

Thus; the mode for the data set given = 2

**Note**

Sometimes; a given data set may have more than one modes or no more at all. The one mode obtained in a set of distribution is known as unimodal or monomodal. If two modes obtained from data set; described as bimodal.

Example:

(1) 2, 5, 4, 3, 5, 6, 6, 8, 5, 6.

The modes for the data set are 5 and 6

(2) 4, 9, 8, 5, 6, 7

The given data set has no mode.

For the grouped data; mode is assessed by the following application.

Whereby:

· L = The lower limit of the modal class

· t_{1 }= The excess of the modal frequency over the frequency of the next lower class

· t_{2} = the excess of the modal frequency over the frequency of the next higher class

· (i) = the class interval

**Example;-**

The tabled data below shows the score of marks in geography subject test form V students

Class interval |
Frequency |

40 – 44 | 7 |

45 – 49 | 8 |

50 – 54 | 11 |

55 – 59 | 10 |

60 – 64 | 4 |

**Solution**

The mode for the given data set above is calculated as follow:-

According to the given data set;

L = 49.5

t_{1} = 3

t_{2} = 1

i = 5

Then;

49.5 + (0.75 x 5)

49.5 + 3.75 = 53.25

**Thus; the mode = 53.25**

**Advantages of a mode**

It helps to make determination of predominance of a certain geographical feature in a place.

It helps to know number of occurrence of the values in data set.

**Disadvantages of a mode**

It needs high mathematical knowledge to calculate mode for the grouped data set

It is unreliable measures of central tendency as a data set may have more than one modes or no mode at all.

**MEDIAN**

Median refers to a point value that divides the other values in a set of distribution into two equal parts after to have been arranged in ascending or descending order.

**Computation of the median**

The computation of the median chiefly depends on the nature of data set given if ungrouped or grouped.

For the ungrouped data set, the calculation of median should further take into account the nature of data set given whether odd or even.

If the ungrouped data set is odd; the median is just the middle value and it is obtained after the value numbers to have been arranged in ascending or descending order.

E.g.

1, 2, 1, 4, 6, 5, 3

**Solution**

The ascending order of the values is as follow:-

1, 1, 2, 3, 4, 5, 6

**Thus; the median = 3.**

If the data set is even; median **is the average of the two middle values** and obtained after the value numbers to have been arranged in ascending descending order.

E.g.

1,4,5,2,7,8,3,2

The ascending order for the values is as follows:-

1,2,2,3,4,5,7,8

Thus; the median = 3.5

**Median determination for the grouped data**

For the grouped data; median is determined by applying the following formula:-

Where by:-

L = The lower limit of the median class

N = Total number of observation

n_{b} = the number of elements in the classes below the median class

n_{w} = number of elements in the median class

i = class interval

**Example:-**

The tabled data below: shows the score of marks in geography subject for form V students.

Class interval |
Frequency |

40 – 44 | 7 |

45 – 49 | 8 |

50 – 54 | 11 |

55 – 59 | 10 |

60 – 64 | 4 |

**Example:-**

The tabled data below; shows the score of marks in geography subject for form V students.

**According to the given data**

L = 49.5

N = 40

n_{b} = 15

n_{w} = 11

i = 5

n_{b} = the number of elements in the classes below the median class

n_{w} = number of elements in the median class

i = class interval

49.5 + (0.45 x 5)

49.5 + 2.25 = 51.75

Thus the median = 51.75

**Advantages of median**

It helps to understand the middle value among of the numerous values in a certain data set.

It is easy to make determination particularly for the simple data set.

**Disadvantages of the median**

If the values are numerous, it becomes cumbersome to arrange in ascending or descending order to get the median

It needs high skill to determine median for the grouped data set.

**MEASURES OF VARIABILITY**

These are the ones which asses the variation of values in data set. The common measures of variability include the following:-

- Range
- Standard deviation
- Variance
- Mean deviation

**RANGE**

Range is the difference between highest and lowest values in a given set of distribution. It is used to assess the existing variation between the highest score and lowest score.

**Calculation of the range**

Calculation of a range also considers the nature of a data set given whether ungrouped or grouped.

**For the ungrouped data se**t, range is calculated by subtracting the lowest value from the highest value in a data set given.

**Example:-**

Determine the range for the following data set 4, 2, 3,5, 6,4, 8

**Solution**

The range for the data set given is computed as following:-

Range = Highest value – lowest value |

According to the given data set:-

· Highest value = 8

· Lowest value = 2

· 8 – 2 = 6

· Thus; The range = 6

With the result of range; If it is high implies greater variation. If the range is small, it implies there is small variation.

For the grouped data; range is calculated by subtracting the lowest class mark from the highest subtracting the lowest lower boundary from the highest lower boundary or by subtracting the lowest higher boundary from the highest higher boundary.

**Example:-**

Determine the range for the following data set.

Class interval |

10 – 1415 – 19
20 – 24 25 – 29 30 – 34 35 – 39 |

**Solution**

The range for the data set given is calculated as follow:

Range = Highest class mark – Lowest class |

Determination of the class mark

Class interval |
Class marks |

10 – 1415 – 19
20 – 24 25 – 29 30 – 34 35 – 39 |
1217
22 27 32 37 |

According to the computed class marks

· Highest class mark = 37

· Lowest class mark = 12

37 – 12 = 25,

**Thus, the range = 25**

**Advantages of a range**

Range gives a quick rough estimate of variability

It is simple to calculate and the majority are much aware with it.

**Disadvantages of a range**

It considers only two values of highest and lowest and thus not sensitive to the total distribution

It is affected by the extreme values

**STANDARD DEVIATION**

Deviation is the difference between the value and the mean. It is computed by subtracting a the mean from the value.

Whereby:-

X = value given in a set of distribution

X = average of all values

**Standard deviation**

refers to the common difference of all values from the mean. It is the root mean square deviation from the mean. It is the measure which determines how far or scattered are the values from the mean.

Standard deviation is represented by sigma symbol of

**Computation of a standard deviation**

Calculation of a standard deviation also depends on the nature of dataset given whether ungrouped or grouped.

For the ungrouped data; standard deviation is calculated by the following application.

Where by:-

X = value in a set of distribution

N = The total number of observation

**Example:-**

Calculate the standard deviation for the following data set.

3, 2, 1, 4, 6

**Solution**

Mean determination

X | 3 | 2 | 1 | 4 | 6 |

X- | -0.2 | -1.2 | -2.2 | 0.8 | 2.8 |

X-X^{2} |
0.0.4 | 1.44 | 4.84 | 0.64 | 7.84 |

·

Then;

Hence; The SD = 1.541

For the grouped data set; standard deviation is computed by the following application:-

**Example:-**

Calculate the SD for the following set of grouped data.

Class interval |
Frequency |

40 – 44 | 7 |

45 – 49 | 8 |

50 – 54 | 11 |

55 – 59 | 10 |

60 – 64 | 4 |

**Procedure:**

· Determination of the mean

Class interval |
F |
X |
Fx |

40 – 44 | 7 | 42 | 294 |

45 – 49 | 8 | 47 | 376 |

50 – 54 | 11 | 52 | 572 |

55 – 59 | 10 | 57 | 570 |

60 – 64 | 4 | 62 | 248 |

Hence; 51.5

Then:-

X | 42 | 47 | 52 | 57 | 62 |

X – X | -9.5 | -4.5 | 0.5 | 5.5 | 10.5 |

(X-X)^{2} |
90.25 | 20.25 | 0.25 | 30.25 | 110.25 |

F(X – X)^{2} |
631.75 | 162 | 2.75 | 302.5 | 441 |

= 1540

= 40

**Thus; The SD = 6.204**

**Note:-**

The square root of SD is known as variance. Its computation is done by the following applications which also consider the nature of data set whether ungrouped or grouped.

For the ungrouped data; variance is computed by the following application:-

**MEAN DEVIATION**

Mean deviation is the average of all deviation values. Or is the amount by which the individual values deviate from mean irrespective of its sign. It is computed by dividing the sum of all deviations irrespective of signs by the number of observation.

**Calculation of mean deviation**

Calculation of a mean deviation also depends on the nature of data set given whether ungrouped or grouped.

For the ungrouped data set; the mean deviation is calculated by the following application:-

Example:-

Determine the mean deviation for the following data set. 4, 7, 8, 2, 9, 6

Solution

Mean determination

4 + 7 + 8 +2 + 9 + 6 = 36

Hence; the mean = 6

Deviations determination

X |
X – |
D |

4 | 4 – 6 | 2 |

7 | 7 – 6 | 1 |

8 | 8 – 6 | 2 |

2 | 2 – 6 | 4 |

9 | 9 – 6 | 3 |

6 | 6 – 6 | 0 |

The sum of deviations determination.

· 2 + 1 + 2 +4 + 3 + 0 = 12

Then;

**Thus; the mean deviation = 2**

For **the grouped data set**, mean deviation is computed by the following application:-

**Example:-**

Class interval |
Frequency |

40 – 44 | 7 |

45 – 49 | 8 |

50 – 54 | 11 |

55 – 59 | 10 |

60 – 64 | 4 |

Determination of the mean

Class interval |
F |
X |
Fx |

40 – 44 | 7 | 42 | 294 |

45 – 49 | 8 | 47 | 376 |

50 – 54 | 11 | 52 | 572 |

55 – 59 | 10 | 57 | 570 |

60 – 64 | 4 | 62 | 248 |

Hence; The mean = 51.5

Determination of the deviations.

Where by:

X = Class mark

X | X – | D | F | Fd |

42 | 42 – 51.5 | 9.5 | 7 | 66.5 |

47 | 47 – 51.5 | 4.5 | 8 | 36 |

52 | 52 – 51.5 | 0.5 | 8 | 36 |

57 | 57 – 51.5 | 5.5 | 10 | 55 |

62 | 62 – 51.5 | 10.5 | 4 | 42 |

· The sum of (fd) determination

66.5 + 36 + 5.5 + 55 + 42 = 205

Then;

**Thus; The mean deviation = 5.125**

His is very best program