Data preparation and preliminary data analysis
7.1 Chapter summary
After developing an appropriate questionnaire and pilot testing the same, researchers need to undertake the field study and collect the data for analysis. In this chapter, we shall focus on the fieldwork and data collection process. Furthermore, once the data is collected it is important to use editing and coding procedures to input the data in the appropriate statistical software. Once the data is entered into the software it is also important to check the data before the final analysis is carried out. This chapter also deals with the how to code the data, input the data and clean the data. It will further discuss the preliminary data analysis such as normality and outlier check. The last section of this chapter will focus on the preliminary data analysis techniques such as frequency distribution and also discuss hypothesis testing using various analysis techniques.
7.2 Survey fieldwork and data collection
As stated earlier, many marketing research problems require collection of primary data and surveys are one of the most employed techniques for collection of primary data. Primary data collection therefore, in the field of marketing research requires fieldwork. In the field of marketing (especially in the case of corporate research) primary data is rarely collected by the person who designed the research. It is generally collected by the either people in the research department or an agency specialising in fieldwork. Issues have been raised with regard to fieldwork and ethics. If a proper recruitment procedure is followed, such concerns rarely get raised. The process of data collection can be defined in four stages: (a) selection of fieldworkers; (b) training of fieldworkers; (c) supervision of fieldworkers and (d) evaluation of fieldwork and fieldworkers.
Prior to selecting any fieldworker the researcher must have clarity as to what kind of fieldworker will be suitable for a particular study. This is critical in case personal and telephone interview because the respondent must feel comfortable interacting with the fieldworker. Many times researchers leave the fieldworkers on their own and this can have a direct impact on overall response rate and quality of data collected. It is very important for the researcher to train the fieldworker with regard to what the questionnaire and the study aim to achieve. Most fieldworkers have little idea of what exactly research process is and if not trained properly, they might not conduct the interviews in the correct manner. Researchers have prepared guidelines for fieldworkers in asking questions. The guidelines72 include:
a. Be thoroughly familiar with the questionnaire.
b. Ask the questions in the order in which they appear in the questionnaire.
c. Use the exact wording given in the questionnaire.
d. Read each question slowly.
e. Repeat questions that are not understood.
f. Ask every applicable question.
g. Follow instructions and skip patterns, probing carefully.
The researcher should also train the fieldworkers in probing techniques. Probing helps in motivating the respondent and helps focus on a specific issue. However, if not done properly, it can generate bias in the process. There are several probing techniques73:
a. Repeating the question b. Repeating the respondents' reply c. Boosting or reassuring the respondent d. Eliciting clarification e. Using a pause (silent probe)
f. Using objective/neutral questions or comments
The fieldworkers also should be trained on how to record the responses and how to terminate the interviews politely. A trained fieldworker can become a good asset in the whole of the research process in comparison to a fieldworker who is feeling disengagement with the whole process.
It is important to remember that fieldworkers are generally paid on hourly or daily basis and paid minimum wages in many cases. Therefore, their motivation to conduct the interviews may not be as high as a researcher overlooking the whole process. This brings about the issue of supervision, through which, researchers can keep a control over the fieldworkers by making sure that they are following the procedures and techniques in which they were trained. Supervision provides advantages in terms of facilitating quality and control, keeping a tab on ethical standards employed in the field, and control over cheating.
The fourth issue with regard to fieldwork is the issue of evaluating fieldwork and fieldworkers. Evaluating fieldwork is important from the perspective of authenticity of the interviews conducted. The researcher can call 10-20% of the sample respondents to inquire the fieldworker actually conducted the interviews or not. The supervisor could ask several questions within the questionnaire to reconfirm the data authenticity. The fieldworkers should be evaluated on the total cost incurred, response rates, quality of interviewing and the data.
7.3 Nature and scope of data preparation
Once the data is collected, researchers' attention turns to data analysis. If the project has been organized and carried out correctly, the analysis planning is already done using the pilot test data. However, once the final data has been captured, researchers cannot start analysing them straightaway. There are several steps which are required to prepare the data ready for analysis The steps generally involve data editing and coding, data entry, and data cleaning.
The above stated steps help in creating a data which is ready for analysis. It is important to follow these steps in data preparation because incorrect data can results into incorrect analysis and wrong conclusion hampering the objectives of the research as well as wrong decision making by the manager.
7.3.1 Editing
The usual first step in data preparation is to edit the raw data collected through the questionnaire. Editing detects errors and omissions, corrects them where possible, and certifies that minimum data quality standards have been achieved. The purpose of editing is to generate data which is: accurate; consistent with intent of the question and other information in the survey; uniformly entered; complete; and arranged to simplify coding and tabulation.
Sometimes it becomes obvious that an entry in the questionnaire is incorrect or entered in the wrong place. Such errors could have occurred in interpretation or recording. When responses are inappropriate or missing, the researcher has three choices:
(a) Researcher can sometimes detect the proper answer by reviewing the other information in the schedule. This practice, however, should be limited to those few cases where it is obvious what the correct answer is.
(b) Researcher can contact the respondent for correct information, if the identification information has been collected as well as if time and budget allow.
NNE and Pharmaplan have joined forces to create NNE Pharmaplan, the world's leading engineering and consultancy company focused entirely on the pharma and biotech industries.
Ines Areizaga Esteva (Spain), 25 years old Education: Chemical Engineer
- You have to be proactive and open-minded as a newcomer and make it clear to your colleagues what you are able to cope. The pharmaceutical field is new to me. But busy as they are, most of my colleagues find the time to teach me, and they also trust me. Even though it was a bit hard at first, I can feel over time that I am beginning to be taken seriously and that my contribution is appreciated.
Ines Areizaga Esteva (Spain), 25 years old Education: Chemical Engineer
- NNE Pharmaplan is the world's leading engineering and consultancy company focused entirely on the pharma and biotech industries. We employ more than 1500 people worldwide and offer global reach and local knowledge along with our all-encompassing list of services. nnepharmaplan.com
nne pharmaplan8
(c) Researcher strike out the answer if it is clearly inappropriate. Here an editing entry of 'no answer' or 'unknown' is called for. This procedure, however, is not very useful if your sample size is small, as striking out an answer generates a missing value and often means that the observation cannot be used in the analyses that contain this variable.
One of the major editing problem concerns with faking of an interview. Such fake interviews are hard to spot till they come to editing stage and if the interview contains only tick boxes it becomes highly difficult to spot such fraudulent data. One of the best ways to tackle the fraudulent interviews is to add a few open-ended questions within the questionnaire. These are the most difficult to fake. Distinctive response patterns in other questions will often emerge if faking is occurring. To uncover this, the editor must analyse the instruments used by each interviewer.
7.3.2 Coding
Coding involves assigning numbers or other symbols to answers so the responses can be grouped into a limited number of classes or categories. Specifically, coding entails the assignment of numerical values to each individual response for each question within the survey. The classifying of data into limited categories sacrifices some data detail but is necessary for efficient analysis. Instead of requesting the word male or female in response to a question that asks for the identification of one's gender, we could use the codes 'M' or 'F'. Normally this variable would be coded 1 for male and 2 for female or 0 and 1. Similarly, a Likert scale can be coded as: 1 = strongly disagree; 2 = disagree; 3 = neither agree nor disagree; 4 = agree and 5 = strongly agree. Coding the data in this format helps the overall analysis process as most statistical software understand the numbers easily. Coding helps the researcher to reduce several thousand replies to a few categories containing the critical information needed for analysis. In coding, categories are the partitioning of a set; and categorization is the process of using rules to partition a body of data.
One of the easiest ways to develop coding structure for the questionnaire is to develop a codebook. A codebook, or coding scheme, contains each variable in the study and specifies the application of coding rules to the variable. It is used by the researcher or research staff as a guide to make data entry less prone to error and more efficient. It is also the definitive source for locating the positions of variables in the data file during analysis. Most codebooks - computerized or not - contain the question number, variable name, location of the variable's code on the input medium, descriptors for the response options, and whether the variable is alpha (containing a - z) or numeric (containing 0 - 9). Table 7.1 below provides an example of a codebook.
Table 7.1:
Sample codebook for a study on DVD rentals
Variable instructions |
SPSS Variable name |
Coding |
Identification n° |
Was this article helpful?
Readers' Questions
-
olli-pekka10 months ago
- Reply