Data mining: From Discovery to Enlightenment
In the current 24/7 economy, also known as the knowledge economy, the ability to act faster and more effectively than the competition can define market as well as industry success for any organization. The new economy is fuelled by information that plays an essential role in the development and sustenance of competitive advantage. Successful and enterprising organizations are those that effectively collect, evaluate and apply information and consequently emerge as consistent industry leaders.
Advances in information technology and proliferation of software and hardware tools have increased the volume of data generated, collected and stored by organizations today. However, this is just the first step and mere data collection does not lead to anything. In order for an organization to get maximum intelligence out of the available data and to achieve maximum level of improvement in its decision making process, the data has to be manipulated to yield relevant information that provides deeper knowledge about a particular business problem /situation. Thus the challenge lies in transforming raw data into insightful information and valuable knowledge.
Looking for Valuable Gems
In a business environment where data generation seems to know no bounds and organizations are craving for deeper knowledge, data mining is one tool that answers the challenge of making sense out of volumes of data in an effective way. This powerful technology backed tool has great potential to help companies transform their data into information and knowledge that is highly relevant to making business decisions. Data mining can help organizations to strike "Gold and Gems" of substantial business value in unexpected places. Data mining, also known as Knowledge-Discovery in Databases (KDD), is described as "the process of identifying valid, novel, potentially useful, and ultimately comprehensible patterns or models in data to make crucial business decisions" by automatically searching large volumes of data. It is a combination of techniques found in statistics, computer science, and artificial intelligence.
Data could be any kind of data that is captured and stored: it could be customer data, store data, demographical data, or geographical data. Data mining exercise can take the form of unearthing patterns, trends or rules that are implicit in the data, thereby helping to reveal significant facts, relationships, behaviors, exceptions and anomalies otherwise unnoticed by analysts. It hopes to uncover gems in the form of relationships between variables that are non-intuitive. This information discovery, not anticipated earlier, is a much more valuable knowledge leap for translating data into a viable solution to a business problem.
Data mining has been practiced for more than a decade by various companies globally to analyze data and to identify causal relationships, find interesting explanations, identify and target current and potential customers. It helps discover information within the data not effectively revealed by traditional tools and techniques such as query and reporting or OLAP (On Line Analytical Processing).
A Business Process, Not Just a Software Tool
It is important to note that data mining is a business process and not just a software tool. It provides insights that the organization can use to conduct its business more effectively. It makes predictions about the future to guide business decisions. Data mining uses hardware, software and "warmware" (skilled labor) to identify previously unknown but potentially useful relationships in large sets of data. It aids in the transformation of data to information, knowledge and wisdom, a cycle that is now becoming an imperative for every organization to follow in order to remain competitive.
Data mining is a process consisting of myriad elements, which include formulating business goals and mapping these to data mining goals; acquiring, understanding and preprocessing the data; evaluating and presenting the results of analysis; and finally, deploying these results to successfully achieve business goals. Thus it is a process where business knowledge is of paramount importance and that relies heavily on the input and involvement of non-technical business professionals in order to produce results that will lead to business benefits. Data mining, when performed without business knowledge, can produce useless results. It is only in the context of clear, well defined objectives and critical thinking from business experts coupled with the interpretation skills of analysts that a data mining project can succeed.
Relation to Data Warehousing
Most customer-focused enterprises consider every instance of a client/prospect interaction as a learning opportunity and capture volumes of customer related data from many sources and organize them in a consistent and useful way. This is called data warehousing which provides the enterprise with a "memory" about its customers and allows the enterprise to remember what it has noticed about its customers. But memory sans intelligence does not yield anything. Data mining can provide the enterprise with "intelligence", thereby allowing sifting through the memory to reveal patterns, devise rules, come up with novel ideas to try, and make predictions about the future. Hence data mining provides tools and techniques that add intelligence to the data warehouse and transforms the warehoused data into actionable information. However, although data mining benefits from a properly designed data warehouse (since the warehoused data is well organized, relatively clean and easy to access), data warehousing is not necessarily needed for applying data mining tools and techniques to an organization's data.
The Process: From Data to Intelligence
The three components of data mining are the captured data, often in a Data Warehouse, the mining of this data and the organization and presentation of this mined information to provide actionable knowledge and intelligence. The data mining process involves a series of steps which have to be performed in order for it to yield the desired intelligence sought by an organization.
While embarking on a data mining project, identifying the right business problem is the trickiest and the most critical part.
Step 1 in the data mining process involves defining the business problem for which data mining is being done. As mentioned earlier, if the problem definition is not accurate and precise, the data mining exercise might only prove to be a costly mistake yielding meaningless or misleading results.
Step 2 involves gathering and preparing the data for the project. In most cases, organizations have already gathered the raw data which may or may not be organized and stored in a data warehouse. The main tasks in this step are to access the right data, capture the relevant samples and cleanse the data to eliminate errors and redundancy. Data preparation has to be done in order to transform the raw data into sanitized and organized batches of usable data, failing which the mining exercise might yield skewed or incorrect results.
Step 3 involves applying analytical methodology for creating, testing, evaluating and interpreting mining models which can be used to extract the desired information from the raw data in the context of the business problem at hand. This step involves building appropriate models which are likely to generate the maximum useful intelligence out of the gathered data.
Step 4 is the final step which involves dissemination of the newly discovered information. The models built in the previous step are applied to the prepared data to obtain the desired information which may then be transformed into actionable knowledge through further analysis. This knowledge is then deployed to the appropriate stakeholders within the organization through custom reports containing valuable insights to aid in superior business decision making.
Some Data Mining Software Tools
Complexities in data mining increase with the volumes of data captured by companies. Various software tools such as SPSS® / Oracle® / SAS® / Statistica can be used for data mining. The choice of techniques or combination of tools and techniques in a particular situation depends on the nature of the data mining task to be accomplished as well as the nature of the available data. Earlier, data mining was a complex, expensive, somewhat limited tool adopted primarily by large companies. This pattern is now changing because of the evolution of new technologies. Now data mining tools are becoming easier to use by employees without extensive training or operational and statistical skills.
According to Aaron Zornes, Research Director for the Meta Group, a leading provider of IT research, advisory services and strategic consulting, "Data mining is still a relatively expensive investment for businesses, but the complexity of the tools continues to decrease as these tools become more accessible to the corporate middle class". In its September 2004 METAspectrum report for data mining, META Group ranked SPSS®, SAS® and Oracle® as leaders in the Data Mining Tools market. The basis for this ranking was the maturity and stability of these tools and their large market share relative to the competition. Some of the other tools including those of IBM®, KXEN, Fair Isaac®, Insightful®, Quadstone and Angoss were ranked as challengers.
Benefits: Why Data Mining Is Gaining Popularity?
"Companies that manage their data as a strategic resource and invest in its quality are already pulling ahead in terms of reputation and profitability from those that fail to do so." (PricewaterhouseCoopers, Global Data Management Survey 2001)
Increased and ready availability of data coupled with inexpensive processing power, has made data mining specifically suitable for finding solutions to business problems. The use of graphical interfaces has also increased tool utility and access for easy use by business experts.
According to Gartner, in 2004, businesses were expected to manage thirty times more data than in 1999 while the average company utilized only 7% of its data warehoused information. It has been repeatedly acknowledged that an informed understanding of one's customer is the most potent competitive advantage to achieve ultimate customer experience and is provided by the company's customer database. Businesses not only understand the value of collecting customer data, but also realize the important challenge of leveraging this knowledge to create intelligent, proactive access points to the customer.
Data mining helps businesses sift through layers of this data mountain that seems full of totally unrelated information and converts it into meaningful relationships. This can then help them to go beyond simply being reactive to proactively anticipating customer needs. According to marketers, a business can boost profits as much as 25-85% by doing proper analysis of the reasons for customer defection to competition, then eliminating those reasons, and taking the right actions to preserve even a small percentage of the defectors. The same applies to other organizational and business issues - Decision Support Systems may use data mining and identify trends to support informed and effective organizational and business decisions.
Since data mining aims to transform raw data into consistent, accurate and reliable corporate information and knowledge, new business opportunities can be created by efficient use of data mining technology through automated prediction of trends and behaviors for better ROI. Organizations and professionals in a wide range of industries - from government agencies to businesses including retail, finance, heath care, manufacturing transportation, and airlines - are already using data mining tools and techniques and taking advantage of available data to predict trends and anticipate shifts in patterns. The data knowledge is thereby leveraged to a new level, creating new opportunities or value for organizations in almost real time. Data mining is gaining popularity because it offers organizations the ability to make better informed decisions based on an understanding of their business environment thus enabling them to maximize profitability and reduce operational costs. It is also a valuable tool for providing a clear picture of future possibilities well ahead of time so that proactive steps may be taken to mitigate risks and minimize losses.
Industry and Data Mining: Multiple Applications
From marketing professionals to financial wizards, from HR professionals to headhunters, from credit card companies to the banking sector and even the Federal government, none have escaped the lure of data mining and its potential benefits.
Real Time and Targeted Marketing Analytics
Data mining, apart from spotting sales trends, can provide Market Analysis for CRM to analyze customer data and provide insights into consumer needs and wants, buying patterns, and identify potential goods and services in demand. It can help in market segmentation, predicting customer churn, fraud detection, identifying prospects for direct marketing, interactive marketing, market basket analysis, pricing, sales forecasting, and customer trend analysis. Besides helping to more accurately predict customer loyalty, this also results in shorter response time to market changes and better alignment of products with customer needs. A clear example is the use of data mining by companies such as AT&T, A.C. Nielson, and American Express. The behavioral data is exploited by their customer analytics group to identify specific and attractive segments among the customer base to enable more effective and efficient customer relationship management.
Retailer Advantage
Retailers benefit through Inventory Logistics to reduce their cost in handling inventory besides identifying the customer demographics in relation to the products bought. This information can then be utilized to stock appropriate and popular merchandise in new store locations besides identifying "hot" products of a particular demographic market that can be replicated in stores with similar demographic characteristics. Retail Analysis and Supply Chain Analysis can help in analyzing sales trends and are used by companies like Wal-mart. Apart from helping managers to understand cost and revenue trends, anticipating volatile marketplace conditions and product demand, data mining tools can help predict effectiveness of promotional programs and decide products to be stocked in each store. Their application is also extended to tracking vendor performance and identifying real and potential problems, analyzing efficiency of distribution networks and understanding supply chain costs.
Financial Services, Brokerage Houses and Insurance
The utility of data mining can range from fraud detection within the financial services industry (to detect potentially fraudulent credit card transactions) to the use of data mining by banks and brokerage houses to better understand the relationships existing within and across accounts. Financial analysts also use data mining for making investment decisions by studying financial records, data feeds, and other information sources. The insurance industry uses data mining for risk analysis and calculating exposures and expectations for period ranges. Companies like Canada Life use advanced data warehousing and analysis methods for timely and accurate actuarial studies. Data mining tools are also used by Generali Group for rapid and flexible analysis of financial market risk and customer credit risk.
Manufacturing Firms
Data mining helps manufacturing firms in Defect Analysis and Quality control which is critical for their success. By use of these tools and techniques manufacturers can identify the characteristics surrounding defective products, and this understanding in turn leads to positive process changes and improved product quality. This can result in improvement of a firm's reputation in the industry, which in turn may help drive up its sales and increase profitability.
Human Resources
Organizations and headhunters can engage in more focused hiring by using data mining techniques to understand the characteristics of top performers. This understanding can help develop hiring profiles with qualifications, personality traits and characteristics best suited for specific tasks.
Health Care
Doctors are now using data mining for predicting effectiveness of surgical procedures, tests, or medications. Health-care organizations use data mining to examine medical records to understand past trends to reduce future costs. Data mining also helps in identification and targeting of high-risk segments of population for proactive treatment. American Healthways, for example, uses predictive modeling for identification of specific patient types more susceptible towards high-risk conditions. This proactive healing approach results in improved quality of life for the patients and also helps to reduce stress on hospitals and insurance providers.
Federal Government
It's not only the private sector that is tapping the potential of data mining. The Federal Government too is using these techniques and tools for various purposes. It analyzes data that has been collected from the public for a wide range of purposes, including determining the eligibility of applicants for Federal benefits, detecting potential instances of fraud, waste, and abuse in Federal programs, for law enforcement activities and for combating terrorism. After the Sept. 11 attacks, government interest in data mining increased sharply as intelligence officials began exploring ways to use it to identify and track individuals suspected of terrorist links.
Not an End, But A Means to an End
Although organizations in general are getting more interested in data mining, they should be wary of collecting data for data's sake. According to Aim Proximity's Geoff Cooper, businesses should be aware of Noah's principle: "people survive not by predicting rain, but by building arks. Noah had foresight about rain but if he had not built the ark he still would have drowned." Similarly, in order to see improvement in profitability, efficiency, performance, etc, organizations have to link the intelligence gained from data mining to business activities. For improved value, data mining results must be converted into executable campaigns for increased success and profitability.
Conclusion: The Human Touch
Since data mining software lacks the human experience and intuition to recognize the difference between a relevant and an irrelevant correlation, "warmware" (in this case business experts and analysts) will still have to play a significant role in interpreting and transforming the knowledge obtained from data mining to wisdom and executable plans of action. Data mining software can help find the "high-profit" gems buried in mountains of information, yet, software can never be a replacement for skilled analysts. Although data mining gives information that would not be available otherwise, this information must be properly interpreted to be useful.
Sources:
http://www.datamentors.com/Files/Overlooked_Relationship_Data.pdf
http://en.wikipedia.org/wiki/Datamining
http://www.statserv.com/datamining.html
http://www.dmreview.com/article_sub.cfm?articleId=4618
http://www.eco.utexas.edu/~norman/BUSFOR/course.mat/Alex/
http://www.tdan.com/i030fe01.htm
http://www.darwinmag.com/read/100103/mining.html
http://www.whitehouse.gov/OMB/legislative/testimony/forman032503.html
http://www.thearling.com/text/integration/integration.htm
http://www.thearling.com/text/whexcerpt/whexcerpt.htm
http://www.oracle.com/technology/products/bi/odm/pdf/bwp_db_odm_10gr2_0905.pdf
http://www.pcs.co.uk/PDFs/Avellino_Discovery-wp.pdf
http://search.epnet.com/login.aspx?authtype=ip,cpid&custid=nypl&profile=ehost&defaultdb=buh