The first stage of the majority of business decision-making is data collection. The majority of the time, information is gathered in the form of words (also called qualitative data, or unstructured data). For example, marketing researchers conduct focus groups, in-depth interviews, and surveys to assist product managers and sales representatives in selecting the best product design and most effective message to convey to customers. Another example is human resource managers who interview candidates in order to assist the company in selecting the best candidate for the job. Once the data collection is complete and the words are available, the data collectors conduct an analysis of the words.
A recent study (Craigie M, Loader B, Burrows R, Muncer S. The Reliability of Health Information on the Internet: An Examination of Expert Ratings) examined the reliability of health information on the internet. Journal of Medical Internet Research. 2002 Jan-Mar;4(1):e2) quantified experts’ consistency in analyzing qualitative data. The text from 18 threads (series of connected messages) posted on a message board by individuals suffering from a chronic disease was included in the data. Each thread began with a message, or question, and ended with a series of responses, or answers. Five doctors who worked in the same specialist unit and had at least five years of experience treating the chosen disease processed the data. The doctors devised the following two scales to process the data. The initial message or question was coded on a six-point scale as follows: A = excellent; B = less excellent but with some details; C = poor with few details; D = vague; E = misleading or irrelevant; F = incomprehensible. The responses or responses were coded using another six-point scale: A = excellent evidence; B = conventional wisdom; C = personal opinion; D = misleading, irrelevant; E = false; F = possibly dangerous.
Following data processing, three statistical tests were used to compare the codes assigned by all five experts: kappa, gamma, and Kendall’s W. The findings indicated that there was inadequate agreement between the codes of all five experts for both the initial question and the responses. Additionally, two of the five experts demonstrated statistically significant disagreements/contradictions in the codes they assigned to the question, and different pairings of experts demonstrated contradictions/contradictions in the codes they assigned to the responses. In plain English, when one doctor labeled an answer “A = evidence-based, excellent,” another doctor labeled it “E = false” or even “F = possibly dangerous.”
Consider the following:
1. The analysts in this study were physicians with at least five years of experience treating the particular chronic disease.
These analysts have a significantly higher level of expertise in the research subject than even the most experienced market researchers analyzing qualitative customer data or human resource managers analyzing candidate data. Therefore, if these highly trained experts are unable to demonstrate consistent qualitative data processing, what are the chances that less trained professionals will demonstrate consistent data analysis?
2. In this study, the criterion was whether an answer was “evidence-based” (see code A).
This is a measurable criterion. Unlike this study, the vast majority of qualitative research in business is based on subjective criteria such as preferences, morals, values, or tastes. If doctors were unable to apply a single objective criterion consistently when coding the text, how can less trained professionals be trusted to apply a large number of subjective criteria consistently when evaluating qualitative data?
3. Should you be concerned when a market researcher analyzes your focus groups?
A typical focus group consists of approximately 12,000 words. This study collected data from 18 threads. An average thread contains approximately five postings of approximately 120 words each. These figures indicate that the data in this study totaled 10,800 words, which is less than the length of a single focus group. In comparison, a typical market research study includes four to eight focus groups, or four to eight times the amount of text. Therefore, if the experts in this study were unable to demonstrate consistency with a dataset the size of a single focus group, what are the chances that a market researcher will demonstrate consistency with a much larger dataset?
4. How concerned should you be when a human resource manager conducts a candidate analysis?
A transcript of an hour-long interview contains approximately 6,000 words (when hiring middle and top managers, the interviews might take a whole day with an order of magnitude more words). When only a few candidates are interviewed, the total data may include 30,000 or more words (for 5 candidates). Therefore, if the experts in this study were unable to demonstrate consistency with a dataset the size of two interviews, what are the chances that a human resource manager will demonstrate consistency with a much larger dataset?
5. How concerned should you be if an investment analyst analyzes some of your companies on your behalf?
A tens of thousand-word annual report is not uncommon. IBM’s 2004 annual report, for example, is 100 pages long and contains more than 65,000 words. Therefore, if the experts in this study were unable to demonstrate consistency when analyzing a dataset in which less than 15% of the data consisted of an annual report from a single company, what are the odds that an investment analyst will demonstrate consistency when analyzing a much larger dataset (such as annual reports, financial statements, and press releases from a few companies)?
6. In this study, a pair of physicians assigned distinct codes to the same question or response.
For example, one doctor might label an answer “A = evidence-based, excellent,” while another might label it “E = false” or even “F = possibly dangerous.” Who is correct? After all, this is medicine, and neither position can be correct. Whom are you to believe? And what are your responsibilities as a decision maker? If you believe the first physician is correct, you should take the response as sound advice and adhere to its directives. If you believe the second physician, you should flee for your life. Now, if such eminent experts have failed to convince us that they can process a small dataset correctly, or at least consistently, how can we believe professionals when they claim to be able to?
The first stage of the majority of business decision-making is data collection. The majority of the time, information is gathered in the form of words. Once the words are available, the data collection professionals analyze them and present the results to the decision maker. As the study by Craigie et al. indicates, these professionals frequently fail to analyze qualitative data properly, producing results that prevent the decision maker from making the correct choice.