Introduction to Statistics and Data Analysis MATH30-6 Probability and Statistics Objectives At the end of the lesson, the students are expected to • Define basic terms and phrases used in statistics; • Identify the importance of statistics in everyday life; • Compare and contrast descriptive and inferential statistics; and • Explain the concepts of methods of data collection and presentation. Statistics The field of statistics deals with the collection, presentation, analysis, and use of data to make decisions, solve problems, and design products and processes. In simple terms, statistics is the science of data. Branches of Statistics Descriptive Statistics (DS) • Concerned with describing the characteristics and properties of a group of persons, places or things. • Based on easily verifiable facts or meaningful information. • Does not draw inferences or conclusions about a larger set of data. Descriptive Statistics Examples • How many passed in the recent Electrical Engineering Licensure examination? • In Applied Life Data Analysis (Wiley, 1982), Wayne Nelson presents the breakdown time of an insulating fluid between electrodes at 34 kV. The times in minutes, are as follows: 0.19, 0.78, 0.96, 1.31, 2.78, 3.16, 4.15, 4.67, 4.85, 6.50, 7.35, 8.01, 8.27, 12.06, 31.75, 35.52, 33.91, 36.71, and 72.89. Branches of Statistics Inferential Statistics (IS) • Draws inferences about a population based on the data gathered from the samples using the techniques of DS. • Composed of those methods concerned with the analysis of a smaller group of data leading to predictions or inferences about the larger set of data. • Statistics that deals in giving a generalization about the whole from an analysis of the part of the group. Inferential Statistics Examples • Is there a significant correlation between the amount spent in studying and final grade in a computer programming course? • Study shows that ABET accredited programs draw more students to enrol at Mapúa Institute of Technology in such programs. Population and Sample Population • Totality of all observations from which the dataset is acquired • All of the possible events should be considered. • Variable that describes population is known as parameter. Example: There are 5,786 students enrolled in MATH10-1. Population: Students of MATH10-1 Parameter: 5,786 (population size) Population and Sample Sample • Small group taken from the population • A group heterogeneous as possible taken from the large group to represent the population • Variable that describes sample is known as statistic. Example: Of the 5,786 students enrolled in MATH10-1, 3,456 are females. Sample: Female students in MATH10-1 Statistic: 3,456 (sample size) Variables Variables are the parameters being studied in statistics. Qualitative Variables • Also known as categorical data which are commonly answered by non numeric data usually qualitative in form • Examples are preferences, gender, civil status, and location. Variables Quantitative Variables • Also known as numerical data which are information and observations that are countable or measurable quantities • Examples are force, weight, height, voltage, current, resistance, tensile strength, and grades. Variables Examples: Classify as Quantitative (QN) or Qualitative (QL). • Weekly allowance • Income of parents • Religion • Age • Address • Educational attainment • Jobs • Schools attended Categories of Quantitative Data Continuous Data • Measurable quantities. Have infinite values between intervals. • Data that have been measured by analog devices and have infinite values based on interpolations • Examples are height, weight, and ratio of persons. Categories of Quantitative Data Discrete Data • Countable quantities. Have finite equal intervals. • Data that have been measured by digital measuring device that tends to have exact values • Examples are number of individuals and months of the year. Dependent vs Independent Variable Independent Variable • A naturally occurring phenomenon that can be altered by increasing or decreasing its magnitude. Dependent Variable • A variable that is observed upon application of the changes applied to the independent variable. Example: The number of hours spent in studying and test scores. Dependent vs Independent Variable Controlled Variable • Kept constant to check for the external effects of the dependent to the independent variable Extraneous Variable • Would have minimal effect to the result of the dependent variable to the independent variable Scales of Measurement • Nominal - Assigning numerical to categorical data. • Ordinal Data - Assigning rank to the levels of data. • Interval - Assigning a constant difference between numeric data. • Ratio - Assigning continuous range of data over a range. Nominal Data • Commonly categorical data assigned to numbers. • The applicable measurement is simply counting the number of times a certain data would fall on the category, like assigning 1 for males and 2 for females. • Other examples include course, civil status, color, and preference. Ordinal Data • Quantities where the numbers are used to designate the rank order of the data • The correlation or the effect of the ranking of one variable can be measured. However, the range for each rank is not constant. • Examples are results of a race, ranking of a beauty pageant, and level of hardness of a material in the Moh’s scale. Interval Data • The range between the numeric values is constant. • Addition and subtraction is applicable, but not for multiplication and division. • Multiplication and division can only be done in the difference between intervals. • Zero point is arbitrary. • Examples include years (1994, 2004, etc.), times (00:00, 20:00, etc.) and temperature in Celsius and Fahrenheit scales. Ratio Data • Widely used data in science and engineering • Almost all the basic mathematical operations can be performed in this data type. • There is a non arbitrary zero point. • Examples include length, mass, angles, charge, and energy. Sampling Sampling is the process of taking samples from the population. • Probability Sampling - This eliminates the biases against certain event that has no chance to be selected by listing all the possible events and taking a chance that they will be selected to be part of the sample. • Non-Probability Sampling - This type of sampling technique has certain or has no chance of an individual of being selected to be part of the sample. Probability Sampling • Simple Random Sampling - Performed by arranging the population according to a certain rule, each element being numbered and a sample is taken by various randomizing principles. - Randomizing events examples are table of random numbers, random number generator in computers and calculators, and lottery or fish bowl technique. - Each event in the population has equal chance of being selected as part of the sample. Probability Sampling • Systematic Sampling - Done by arranging the population in accordance to a certain order and the sample will be taken by dividing the population into equal groups and obtaining the kth element in each group Examples: - Getting the temperature of the device every 4 hours - Getting the voltage of the signal every constant interval and converting to another signal Probability Sampling • Stratified Sampling - Done by grouping the population into strata, a subpopulation with generally homogeneous or similar characteristics - After dividing the population into several strata, a random sampling is performed in each stratum proportional to the size of each stratum relative to the population. Probability Sampling • Stratified Sampling Example: A survey to find out if families living in a certain city are in favor of construction of manufacturing plant will be conducted. To ensure all income groups represented, respondents will be divided into: Class A – high income Class B – middle income Class C – low income Probability Sampling Strata Number of Families A B C 1000 2500 1500 N = 5000 • Stratified Sampling - Using a 5% margin of error, how many families should be included in the survey? Use Slovin’s formula: 𝑛 = 𝑁 1+𝑁𝑒 2 - Using proportional allocation, how many from each group should be taken as samples? Probability Sampling • Cluster Sampling - Done by identifying groups called clusters, a subpopulation with elements as heterogeneous or diverse characteristics as possible - The clusters must be similar to each other with respect to the parameter being examined. - A cluster or clusters will be selected as sample. - Preferred since it will save time and money to go to various clusters - Example: Selection of a certain region. Non-Probability Sampling • Convenience Sampling - Based primarily on the availability of the respondents - Used because of the convenience it offers to the researcher - Example: Gathering data through telephone. • Quota Sampling - There is a desired number of sample and the respondents were taken as they volunteered themselves to become part of the experiment. - Almost similar to the stratified random sampling - Example: Phone call survey where the first 100 callers are taken Non-Probability Sampling • Purposive Sampling - The sample is obtained based on a certain premise. - Example: A study about pregnant women where the male population would have zero chance of being selected as part of the survey Summary • There are two fields of Statistics: Descriptive and Inferential Statistics. • Population is the totality of all observations from which the dataset is acquired. Sample is a subset of population. • Variables are classified as quantitative or qualitative and independent or dependent. • The scales of measurement are nominal, ordinal, interval, and ratio. • Sampling techniques are classified as probability (random, systematic, stratified, and cluster) and nonprobability (convenience, quota, and purposive). References • Montgomery and Runger. Applied Statistics and Probability for Engineers, 5th Ed. © 2011 • http://en.wikipedia.org/wiki/Statistics • http://writing.colostate.edu/guides/research/stats/ind ex.cfm
© Copyright 2024