Mining an Analytic Services Database Skip Navigation
Essbase® Analytic Services Database Administrator's Guide | Update Contents | Previous | Next | Print | ? |
Information Map

Mining an Analytic Services Database


Data mining is the process of searching through an Analytic Services database for hidden relationships and patterns in a large amount of data. Using traditional query tools, you can answer many standard questions about your business, such as "How profitable were root beer sales in New York in July?" Data mining helps you to search through large amounts of data and come up with new questions, such as "What will root beer sales look like in August?"

This chapter describes data mining and explains how to mine Analytic Services databases.

Understanding Data Mining

Data mining tools can sift through data to come up with hidden relationships and patterns. You may find that people who bought root beer in July also bought ice cream in July. Then you can use this knowledge to create an advertising campaign around root beer floats.

You can use data mining to tell you things about existing data, as in the root beer example. Such data mining is called descriptive. You can also use data mining to forecast future results based on past performance. For example, you can forecast sales for next year based on sales for this year. Such data mining is called predictive.

Essbase Analytic ServicesData Mining Framework

Data Mining Framework is a collection of features that enables the performance of data mining on Analytic Services databases. Data Mining Framework is licensed separately from Analytic Services.

Note: Data Mining Framework does not currently operate on relational data, that is, hybrid analysis data. Data mining is also not supported currently for Unicode-mode applications.

The process of mining an Analytic Services database includes the following tasks:

The following sections describe the data mining process in more detail.

Creating Data Mining Models

The first step in building a data mining model is to understand the business situation or problem and choose the appropriate algorithm to use for analysis or prediction. Algorithms determine how you search through data.

Data Mining Framework provides a set of powerful algorithms that can be used for a broad range of business applications. See Built-in Algorithms for a description of the available algorithms. The description of each algorithm provides information about the type of problem that it is best suited for.

Note: Data Mining Framework also enables you to easily register and use new algorithms created by Hyperion or third-party vendors.

For example, suppose you want to determine how sales of televisions, DVDs, and VCRs in the East region correspond to sales of cameras in the same region. You can use the regression algorithm to model sales of cameras as a function of sales of TVs, VCRs, and DVDs.

The regression algorithm predicts the value of a dependent (target) variable based on the value of one or more independent (predictor) variables.

Using an Algorithm

All data mining algorithms require training, or learning. Training is the process of executing an algorithm against a representative set of data to discover and describe the patterns and relationships in the data that can be used for prediction. Once a model has been trained against a representative set of data, it can then be applied to a wider set of data to derive useful information.

Learning can be either supervised or unsupervised. Supervised learning discovers patterns and relationships in the data by comparing the resulting data to a set of known data. Unsupervised learning discovers patterns and relationships in the data without comparing the data to a known set of data.

The algorithm vendor determines the specific information that you must provide to use the algorithm. The regression algorithm employs supervised learning. Therefore, the model requires both input data and output data.

To use an algorithm, you need to enter its settings, parameters and accessors.

Settings are actually determined by the Data Mining Framework, and as such are the same for all algorithms, but they do influence the operation of the algorithm. For example, the framework provides a setting to define how missing values are treated by the algorithm.

Parameters are specific to each algorithm and determine how the algorithm operates on the data. For example, the clustering algorithm provides a parameter to specify the maximum number of clusters to return.

Accessors specify the input and output data for the algorithm. In a build task, supervised algorithms such as regression have two accessors, a predictor to define the independent or input data, and a target to define the dependent or expected output data. Unsupervised algorithms, such as clustering, have a predictor accessor only.

Test and apply tasks generally have both predictor accessors to define the input and target accessors to define the output.

Accessors consist of domains, which are typically MaxL DML expressions that define components of the accessor. For example, the predictor accessor for the regression algorithm contains the following domains:

The target accessor has the same set of domains as the predictor. You write MaxL DML expressions to define the predictor and target accessors.

For example, consider this sample data mining problem:

Given the number of TVs, DVDs, and VCRs sold during a particular period, in the East region, how many cameras were sold in the same period in the East? Restrict sales data to prior year actual sales.

Using the regression algorithm, the predictor and target accessors to define the model for this problem are as follows:

Predictor.Predictor

{[Television], [DVD], [VCR]}

Predictor.Sequence

{[Jan 1].Level.Members}

Predictor.External

{[East].Children}

Predictor.Anchor

{([2001], [Actual], [Sales])}

Target.Target

{[Camera]}

Target.Sequence

{[Jan 1].Level.Members}

Target.External

{[East].Children}

Target.Anchor

{([2001], [Actual], [Sales])}



Note: In this example, the target accessor is the same with regard to all the predictor attributes except the target domain ({[Camera]} ). However, the domain expressions for different accessors are not required to be the same. The only requirement is that a predictor component (for example predictor.sequence) and the corresponding target component (target.sequence) must be the same size.

For each city in the East ({[East].Children}), the algorithm models camera sales as a function of TV, DVD, and VCR sales. The Data Mining Framework creates, under the same name, a family of results, or models; a separate result for each city in the East.

Training the Model

The final step of specifying a build task is to execute the algorithm against the data specified by the accessors to build or train the model. During the training process, the algorithm discovers and describes the patterns and relationships in the data that can be used for prediction.

Internally, the algorithm represents the patterns and relationships it has discovered as a set of mathematical coefficients. Later, the trained model can use these patterns and relationships to generate new information from a different, but similarly structured, set of data.

Note: If you cancel a data mining model while you are training it, the transaction is rolled back.

Applying the Model

After the model is trained, it is ready to use on a new set of data. To apply a model to a new set of data, you specify an apply task. In the apply task you specify a build model you trained and a set of accessors. Generally, the values are the same for the predictor and sequence domains for a build task and its related apply task. You change the external or anchor domain to apply the model to a different set of data. For example, you could change the external domain to specify a different region or country. Or you could use the anchor domain to specify a different year.

In the apply task, the target result data is not known, but is to be predicted by the model.

The apply task applies the model coefficients it generated to the new set of data and generates a set of output data. The Data Mining Framework writes the result data back to the Analytic Services cube. The apply task generates a result record that you can use to query the result data.

Testing the Model

Data mining models are built using known data to train the algorithm so it can be applied to a similar data set. To test a model, you create a test task. In the test task, you specify a model you have trained and a set of accessors. In addition to the predictor and target accessors, you specify test accessors that reference a known set of results.

The test task compares the results of the trained model to the set of known results you specify. The test task determines if the results match within a specified range of expected error. If the results do not match, you can do any of the following:

See "Creating or Modifying a Test Task" in Essbase Administration Services Online Help for information about creating a test task.

Viewing Data Mining Results

Data Mining Framework writes mining results back to the Analytic Services cube. Data Mining Framework creates a result record, in XML format, that contains accessors that specify the location of the result data in the cube.

You can view data mining results through the Data Mining node in Administration Services or by using MaxL statements.

Preparing for Data Mining

The one essential prerequisite for performing data mining is that you understand your data and the problem you are trying to solve. Data mining is a powerful tool and can yield new insights. However, if you already have a strong hunch about your data, then data mining can be particularly useful in confirming or denying your hunch, and giving you some additional insights and directions to follow.

Before you mine an Analytic Services database, make sure that the database is loaded and calculated.

Built-in Algorithms

Hyperion supplies the following basic algorithms:

Accessing Data Mining Functionality

Data Mining Framework is supported by the MaxL and Administration Services interfaces.

MaxL provides a set of statements explicitly for data mining. With MaxL you can perform all data mining functions, including creating, training, testing, and applying a data mining model.

Sample model templates for each of the algorithms are available through the Administration Services interface. A model template provides an outline of the accessors needed for that algorithm that you can fill in. It also sets some parameters required by the algorithm.

Administration Services enables you to manage data mining models, templates, transformations, and results. It provides a mining wizard that steps you through the process of creating and training a build model, and creating and applying apply and test models. See "Mining an Analytic Services Database" in Essbase Administration Services Online Help.

Creating New Algorithms

You can create your own algorithms using Java and register them with Data Mining Framework. In order to be recognized by Data Mining Framework, an algorithm must implement certain interfaces and have a specific signature. These requirements are described in detail in the Algorithm Vendor's Guide shipped as part of the Data Mining Framework SDK.

After a new algorithm is registered, it appears in the list of supplied algorithms and you can use the new algorithm to create build and apply tasks. Data Mining Framework reads the instructions for each parameter in the algorithm from the algorithm signature. The instructions appear in the Build and Apply Wizards in the panels where the user sets the algorithm parameters, just like the instructions for the supplied algorithms.



Hyperion Solutions Corporation link