WizRule
[dt_accordion]
[dt_item title=”Q: How do I know that the WizRule algorithm is actually finding all the possible rules?”]
A: The algorithm itself is far too complex to be explained in this guide. But you can perform an experiment: take a data set in which rules have already been defined and have WizRule analyze it. See if WizRule can find the rules!
[/dt_item]
[dt_item title=”Q: WizRule did not discover a certain error in the data. Why?“]
A: There are several possible reasons:
The case does not deviate from a rule that can be discovered from the data. For example, if a certain item was sold only once, and the wrong price was entered, WizRule will not discover any rule, and thereby will not point to this case as a suspected error.
The case deviates from a rule that does not match the Rule Type parameters. For example, if a certain item was sold 20 times, 19 times at one price and one time at another, and if the minimum number of cases in a rule is 30, WizRule will not discover the rule in regard to the price of this item, and as a result, will not discover the deviation.
The case is not inconsistent with the frequencies of the values in the data. For example, if the price in the previously mentioned case appears in the data only once, it may be a special price for this item and not an error. WizRule considers such a price a suspected error only if this price is more often associated with other items.
The case is explicable by other rules. For example, if the price in the previously mentioned case can be explained by another rule that relates between the customer who bought the item and the price, then WizRule will not consider the case a suspected error.
This last point is very important if your data contains fields that are calculated automatically by the system. Suppose that your data contains a Total field that is automatically calculated from the Quantity and Price fields. Suppose also that the price of a certain item sold many times was wrong in one case. Since WizRule will discover the formula rule that relates the Price, the Quantity and the Total, and since this formula holds for the case where the price was wrong, this case will be explicable by the formula rule. As a result, WizRule will not consider it a suspected error. To avoid such a possibility, in cases like this, you should run WizRule twice: once where the Total field is included in the analysis, and a second time where it is not. You do this by checking Ignore “if” and Ignore “then” in the Field Grid. On the first run, WizRule will check if there are deviations from the formula, and on the second, WizRule will discover deviations in the price or the quantity.
[/dt_item]
[dt_item title=”Q: How do I correct the errors that WizRule discovers in the data? “]
A: Following standard auditing norms, WizRule does not modify the data that it analyzes. To correct the data, you must use the program which created it. However, you can export the deviations to an MS Access database, and then create an updating SQL Query (see page 66-67).
[/dt_item]
[dt_item title=”Q: How do I print all the deviations of a rule?“]
A: WizRule reports are limited to those deviations with a level of unlikelihood greater than 0.5. To print all the deviations of a rule, you need to issue a query in your database application. For example, you have the following rule:
If Customer is Summit and Item is Car then Salesperson is Charles
Rule’s probability: 0.95
The rule exists in 100 records.
To issue a report that contains all five deviations from this rule, you should use the following query:
Customer = Summit & Item = Car & Salesperson <> Charles
[/dt_item]
[dt_item title=”Q: How can WizRule decrease the number of rules while revealing only the most significant ones? “]
A: In the Rule Type dialog box perform one or more of the following:
- Increase the Minimum probability of if-then rules.
- Increase the Minimum accuracy of formula rules.
- Increase the Minimum number of cases in a rule.
- Decrease the Maximum number of conditions in a rule.
[/dt_item]
[dt_item title=”Q: WizRule generated the Rule Report with no (zero) rules. Why? “]
A: The parameters that you set in the Rule parameters dialog box were not sufficient to establish rules. Readjust the parameters: you might change one or more of the following:
- Minimum probability of if-then rules
- Minimum number of cases in a rule
[/dt_item]
[dt_item title=”Q: How do I increase WizRule speed of issuing the rules? “]
A: You may set a number of different parameters to influence the speed of issuing the reports. The main methods include:
- Increasing the Minimum number of cases in a rule in the Rule Type dialog box.
- Deleting uninformative fields by marking Ignore in the Field Grid.
- Increasing the RAM of your computer. (Note that if not enough RAM is available, it causes WizRule to use the hard drive resources to issue the calculations, decreasing its efficiency by a factor of 100).
[/dt_item]
[dt_item title=”Q: What do I do if, when issuing the rules, WizRule becomes stuck in the middle of the calculation stage? “]
A: You can press the Cancel button. WizRule will issue the reports regarding the rules discovered up to that point. Note that when issuing the reports, WizRule periodically updates the total number of the rules it has discovered. By watching the changes being displayed, you can determine how often new rules are revealed; when the rate slows down, you can press Cancel.
[/dt_item]
[dt_item title=”Q: How do I sort the rules in WizRule Rule Report? “]
A: The rules in the Rule Report are sorted by the rule significance level, the rule probability or number of cases in a rule. The criterion is determined in the Rule Type dialog box. You can re-sort the rules by selecting the Display Rule Option button in tool bar. If you wish to issue a report where the rules are sorted another way, use the Print option to export the rules to MS Access, and sort the rules there.
[/dt_item]
[dt_item title=”Q: How do I save WizRule reports? “]
A: Whenever you operate WizRule, it creates a *.wwr file, which includes the parameters of the analysis and the last issued reports. You can save this file through the File – Save or File – Save As option. To open the saved file, select it through the File – Open option (or from the list of last four saved files).
Note that when you reopen a file, WizRule keeps the previously issued reports until the new ones are actually issued; then it replaces them with the new reports. If you want to save the previous reports, use the File – Save As option to save the *.wwr file under a different name, and reissue the analysis from the beginning.
[/dt_item]
[dt_item title=”Q: How do I use WizRule rules in another application? “]
A: Use the Print option to export the rules to MS Access, and then used the rules in your application.
[/dt_item]
[dt_item title=”Q: What is the structure of the MS Access table created by the WizRule export option? “]
A: WizRule exports the rules to MS Access in the following two formats:
- Spreadsheet format: Each rule is written in one record, where each column refers to another field in the data set.
- One condition in a line: Each condition (and the result) is written in a separate record.
[/dt_item]
[dt_item title=”Q: How do I change the appearance of WizRule reports? “]
A: There are three ways:
- Use the Data Format dialog box to change the report header and the settings for font type, style and size.
- Use the Print option to export the report to MS Access, and edit it there.
- Use the standard Windows functions to cut, copy and paste parts of the report into a word processor (or any other Windows-compliant-text- processing application, such as PowerPoint) and edit it there.
[/dt_item]
[dt_item title=”Q: With WizRule I’ve noticed that rule numbers in viewer don’t relate to rule numbers written to MS Access mdb. Is there something I can do about this?“]
A: When exporting the rules to a table, the rules are not sorted (they are saved in the
same order they were discovered).
[/dt_item]
[dt_item title=”Q: Can WizRule be used for market basket analysis? Does it use the a priori algorithm? What about doing sequential analysis?“]
A: WizRule can be used for basket market analysis. WizRule is based on a proprietary
association rule algorithm. Like other association rules algorithms, WizRule reveals ALL the if-then rules that meet the user pre-defined thresholds with regard to the minimum number of cases in a rule (support level) and minimum probability (confidence level).
Neither the number of fields (products in basket market analysis) nor the number of records (sale transactions), are limited. The rules can be exported to a database.
[/dt_item]
[dt_item title=”Q: What is the difference between WizRule and ACL or IDEA? Why should I buy WizRule when I have ACL or IDEA?“]
A: ACL or IDEA use pre-written audit programs. When you use a program of this type you have to remember that you have a preconceived notion of what you are looking for and the audit program is going to reflect those notions. You have to know in advance what you are looking for – this requires a hypothesis. What about relationships that exist within the data and are not obvious? For example, let’s say that there is a relationship between delinquencies and the SIC Codes and perhaps there is an anomaly showing that a vendor has accounts that are far beyond the statistical norm for that industry. WizRule will find all of these. More often than not, you will find that there are findings that you missed by not using WizRule.
[/dt_item]
[dt_item title=”Q: How do can I make SQL queries from several rules using WizRule?“]
- Click on the Print option
- In the Print to line select Access
- Select the rules to be printed
The selected rules will be printed into an *.mdb table. You can then use this table to
issue queries. Most of our customers prefer this option (rather than the Make SQL).
Read user’s manual for more information about it.
[/dt_item]
[/dt_accordion]
WizWhy
[dt_accordion]
[dt_item title=”Q:How do I know that the WizWhy algorithm is actually finding all the possible rules? “]
A: The algorithm itself is far too complex to be explained in this guide. But you can perform an experiment: take a data set in which rules have already been defined and have WizWhy analyze it. See if WizWhy can find the rules!
[/dt_item]
[dt_item title=”Q: How do I know that the WizWhy algorithm predicts accurately? “]
A: You can check how well WizWhy predicts without going into the details of the WizWhy algorithm. Simply select a data set; cut it randomly into two parts, one part will serve as a train file, while the other part will be the test file. Select the train file for issuing the rules, and then validate the rules by issuing predictions to the test file. See pages 98 – 100 for more explanations.
[/dt_item]
[dt_item title=”Q: How accurate WizWhy is in comparison with other tools for issuing predictions? “]
A: You can compare WizWhy with other tools for issuing predictions by using the method described in the previous answer. Using this method one can compare the accuracy of any two methods without going into details of the mathematical algorithms behind the methods.
[/dt_item]
[dt_item title=”Q: How does WizWhy avoid revealing redundant rules? “]
A: When WizWhy reveals the rules, it deletes the redundant ones. For example, consider the following two rules:
(1) If Field A is a, the dependent variable is r.
(2) If Field A is a, and Field B is b, the dependent variable is r.
Because rule (2) is identical to rule (1), except for the additional condition in (2), its rule probability should be at least 2% higher than rule (1) probability. For example, if the rule probability of rule (1) is 70%, the rule probability of (2) should be 72% or higher. If not, (2) is considered to be a redundant rule.
[/dt_item]
[dt_item title=”Q: Can WizWhy analyze series? “]
A: WizWhy’s algorithm is not a time series analysis algorithm. However WizWhy can be applied, if you convert your data to a table. Assume that you data set contains the sales figures of the last 100 months. Convert this list into records having the following fields: The first record will contain the sales of the first and the second months, the second record will contain the figures of the second and the third months, and so on. You can add other fields such as the month name (January, February, and so on), the difference between month N and month N-1. You will end up with a table having the following fields: Sales of Month N, Sales of Month N-1, % Difference, Month N name. Select Month N as the dependent variable. WizWhy will reveal the rules explaining the sales in any month as a function of the sales in the previous month and the month name. These rules will present both the trend (the change from the previous month) and seasonal effect (the month).
[/dt_item]
[dt_item title=”Q: What are the parameters that affect WizWhy prediction’s accuracy? “]
A: Usually WizWhy’s predictions are most accurate when you use the WizWhy defaults in the Rule Parameters dialog box. If you reduce the number of cases in a rule, WizWhy will reveal more rules, but the effect of overfitting might be increased as well. On the other hand if you increase the number of cases in a rule, the effect of overfitting will be decreased, but some important rules might be ignored.
[/dt_item]
[dt_item title=”Q: How does the cost of error affect the accuracy of WizWhy predictions? “]
A: If the difference between misses (WizWhy predicts 1, but the actual value is not 1) and false alarms (WizWhy predicts not 1, but the actual value is 1) matters, you have to enter the cost of errors. The cost of errors signifies the proportion between the importance of avoiding a miss versus the importance of avoiding a false alarm. WizWhy’s object is to minimize the total cost of errors. If the cost of a miss is higher than a cost of false alarm, WizWhy predictions will include fewer misses than false alarms. When the cost of a miss is equal to the cost of a false alarm the total number of errors is minimal.
[/dt_item]
[dt_item title=”Q: How can I decrease the number of misses and false alarms in WizWhy? “]
A: Perform one or more of the following:
- In the Rule Parameters dialog box, decrease the Minimum number of cases in a rule.
- In the Error Costs dialog box make sure that the cost of a miss is equal to the cost of a false alarm.
[/dt_item]
[dt_item title=”Q: How can I decrease the number of rules while revealing only the most significant ones with WizWhy? “]
A: In the Rule Parameters dialog box perform one or more of the following:
- Increase the Minimum probability of if-then rules.
- Increase the Minimum probability of if-then-not rules.
- Increase the Minimum number of cases of a rule.
- Decrease the Maximum number of conditions in a rule.
[/dt_item]
[dt_item title=”Q: WizWhy Rule Report was generated with no (zero) rules. Why? “]
A: The parameters that you set in the Rule Parameters dialog box were not sufficient to establish rules. Readjust the parameters: you might change one or more of the following:
- Minimum probability of if-then rules
- Minimum probability of if-then-not rules
- Minimum number of cases in a rule
[/dt_item]
[dt_item title=”Q: How do I increase WizWhy speed of issuing the rules? “]
A: You can set a number of different parameters to influence the speed of issuing the reports. The main methods include:
- Increasing the Minimum number of cases in a rule in the Rule Parameters dialog box.
- Reducing the Maximum number of conditions in a rule in the Rule Parameters dialog box.
- Deleting uninformative fields by selecting Ignore in the field grid.
- Increasing the RAM of your computer. (Note that if not enough RAM is available, it causes WizWhy to use the hard drive resources to issue the calculations, decreasing its efficiency by a factor of 100).
[/dt_item]
[dt_item title=”Q: What do I do if, when issuing the rules, WizWhy gets stuck in the middle of the calculation stage? “]
A: You can click the Move Forward button. This button appears on the Progress Indicator, whenever possible. For example, if you click this button in the searching for 3-conditions rules, WizWhy will WizWhy will jump to next stage without completing the search for rules having 3 conditions (or more). You can also select the Cancel button to stop the entire process of issuing the rules.
[/dt_item]
[dt_item title=”Q: What is the largest size of data set that WizWhy can practically analyze? “]
A: The size of the data set is neither limited by the number of fields nor the number of records. However, to save time, if the data set is very large consider the following:
If there are more than 200 – 300 fields, you can start the analysis by issuing the trend report, and then ignoring all the fields having a low prediction power.
If there are more than 500,000 records, you can start by creating a representative sample of the data (using one of the statistical packages) and run WizWhy on this sample. Note that as rule of thumb 1000 positive cases suffice for revealing the important rules. For example, if the primary probability of the predicted value is 1%, a data set having 100,000 (where 1000 are positive examples) contains enough information to enable WizWhy to reveal the important rules.
[/dt_item]
[dt_item title=”Q: How does WizWhy segment the dependent variable into intervals?“]
A: When the dependent variable is continuous and is not analyzed as Boolean, WizWhy cuts it into up to 9 intervals. The segmentation into intervals follows two restrictions: (1) the intervals should be in accordance with the distribution of the values, and (2) the intervals should be as equal as possible.
[/dt_item]
[dt_item title=”Q: How do I save WizWhy reports? “]
A: Whenever you operate WizWhy, it creates a *.wwr file, which includes the parameters of the analysis and the last issued reports. You can save this file through the File – Save or File – Save As option. To open the saved file, select it through the File – Open option (or from the list of last four saved files).
Note that when you reopen a file, WizWhy keeps the previously issued reports until the new ones are actually issued; then it replaces them with the new saved reports. If you want to save the previous reports, use the File – Save As option to save the *.wwr file under a different name, and reissue the analysis from the beginning.
[/dt_item]
[dt_item title=”Q: How do I use WizWhy rules in another application? “]
A: You have two options:
Use the Print option to export the Rule, Trends, Unexpected Rules, If-and-only-if Rules or Unexpected Cases reports to MS Access, and then use the rules in your application.
Use the WizWhy ActiveX (OCX) version. This program lets you operate all the WizWhy commands embedded in another application. Note that the ActiveX program is not included in the WizWhy package, and should be purchased separately.
[/dt_item]
[dt_item title=”Q: How do I sort WizWhy rules in the Rule report? “]
A: The rules in the Rule Report are sorted by the rule significance level, the rule probability or number of cases in a rule. The criterion is determined in the Rule Parameters dialog box. If you wish to issue a report where the rules are sorted in another way, use the Print option to export the rules to MS Access, and sort the rules there.
[/dt_item]
[dt_item title=”Q: What is the structure of the Microsoft Access table created by the WizWhy export option? “]
A: WizWhy exports the rules to Microsoft Access in the following two formats:
Spreadsheet format: Each rule is written in one record, where each column refers to another field in the data set.
One condition in a line: Each condition (and the result) is written in another record.
[/dt_item]
[dt_item title=”Q: How do I change the appearance of WizWhy reports? “]
A: There are three ways:
- Use the Data Format dialog box to change the report header and the settings for font type, style and size.
- Use the Print option to export the report to Microsoft Access, and edit it there.
- Use the standard Windows functions to cut, copy and paste parts of the report into a word processor (or any other Windows – compliant – text – processing application, such as PowerPoint) and edit it there.
[/dt_item]
[dt_item title=”Q: Can WizWhy be used for time series analysis?“]
A: Some of WizWhy’s users are using it for time series analysis. They built the data as a moving window. For example:
Record #1: day1, day2, day3, day4
Record #2: day2, day3, day4, day5
Record #3: day3, day4, day5, day6
The dependent variable is the last day in each record. The columns may contain additional fields, such as the % increase from the first day to the last day, etc.
[/dt_item]
[dt_item title=”Q: Does WizWhy create clusters, use decision trees, or neural nets?“]
A: WizWhy does NOT create clusters and does NOT use decision trees. The main difference is that the decision tree algorithm reveals SOME of the rules, while the WizWhy algorithm reveals ALL the if-then rules!
On top of revealing all of the if-then rules, WizWhy also reveals unexpected rules (that is, interesting phenomena), if-and-only-if rules (necessary and sufficient conditions), and it points out cases deviating from the discovered rules (those are suspected errors or cases of fraud).
The predictions issued by WizWhy are more accurate than those issued by decision trees or neural nets. Since our algorithm is totally different from both decision trees and neural nets, it makes sense to use it as a complementary analyzer and predictor.
[/dt_item]
[dt_item title=”Q: What is the minimal internal memory needed for WizWhy?“]
A: While reading the data WizWhy creates tables. The search for the rules is done on these tables. The size of the table depends on the number of fields and the number of values in each field (having a number of cases higher than the minimum cases in a rule).
WizWhy allocates a place for these tables. If there is not enough memory (RAM) then WizWhy uses free space on the (C: hard drive). This lowers the process. To speed up the runtime we recommend increasing the internal memory.
[/dt_item]
[dt_item title=”Q: Is there any academic paper referring to WizWhy?“]
A: Academic papers referring to WizWhy:
Abraham Meidan, Wizsoftt’s WizWhy, in Oded Maimon and Loir Rokach (Eds.), The Data Mining and Knowledge Discovery Handbook, Springer 2005, pp. 1365-1369.
http://www.springeronline.com/sgw/cda/frontpage/0,11855,4-102-22-480876677-0,00.html?changeHeader=true
[/dt_item]
[/dt_accordion]
WizSame
[dt_accordion]
[dt_item title=”Q: What is the largest size of data set that WizSame can practically analyze? “]
A: The size of the data set is neither limited by the number of fields nor the number of records.
[/dt_item]
[dt_item title=”Q: How can I decrease or increase the number of matching sets in WizSame report? “]
A: The following parameters determine the number of matching sets:
- Ignore Field: When you select to ignore a field, WizSame may reveal more matching sets. For example, if you ignore all the fields except for the City, then every group of records having the same city constitutes a matching set.
- Full Match: When you check the full match column, WizSame may reveal less matching sets. As long as this column is unchecked, similarity of values is enough for establishing a matching set.
- Conditions connected by OR: When you determine the matching criterion in the Advanced window, you may increase the number of matching set by connecting the conditions OR.
[/dt_item]
[dt_item title=”Q: How do I increase WizSame speed of issuing the report? “]
A: The main methods include:
- Deleting uninformative fields by selecting Ignore in the field grid.
- Increasing the RAM of your computer. (Note that if not enough RAM is available, it causes WizSame to use the hard drive resources to issue the calculations, decreasing its efficiency by a factor of 100).
[/dt_item]
[dt_item title=”Q: How do I save WizSame report? “]
A: Whenever you operate WizSame, it creates a *.wzs file, which includes the parameters of the analysis and the last issued reports. You can save this file through the File – Save or File – Save As option. To open the saved file, select it through the File – Open option (or from the list of last four saved files).
Note: that when you reopen a file, WizSame keeps the previously issued report until the new ones are actually issued; then it replaces the report with the new saved report. If you want to save the previous report, use the File – Save As option to save the *.wzs file under a different name, and reissue the analysis from the beginning.
[/dt_item]
[dt_item title=”Q: What is the structure of the MS Access table created by the WizSame export option? “]
A: WizSame exports the matching sets to MS Access in the following two formats:
Spreadsheet format: Each record is written in one line, where each column refers to another field in the data set. The first column, MatchNum denotes the matching set number. All the records (i.e., lines) having the same matching set number belong to the same matching set.
One field in a line: Each field is written in another record (similar to the printed format).
Both formats appear under queries in the MS Access database.
[/dt_item]
[dt_item title=”Q: How do I create a table of all of the duplicate records with WizSame? “]
A: Use the Print Matching Sets from the Issue menu to create an ASCII file where each record is in one line. The first column denotes the matching set number, the second column denotes the record number and the third column denotes the data set number.
[/dt_item]
[dt_item title=”Q: How do I instruct WizSame to ignore certain words (such as street, road, ave., etc.)? “]
A: In the WizSame synonym dictionary, enter a line where the first cell is blank, and the other cells contain the words to be ignored (street, road, etc.). This line tells WizSame that blank is synonymous with these words, and therefore they are ignored.
[/dt_item]
[dt_item title=”Q: How do I save the non-duplicate records while I am analyzing one data set with WizSame?“]
A: Use Print to a New File from the Issue Menu, choose the Print Records Not in Matching Sets option to create a new ASCII file that contains all the records except the duplicate records.
[/dt_item]
[dt_item title=”Q: How do I instruct WizSame to reveal duplicate records in my Vendors list and Employees list when the name is similar and the address is not similar?“]
A: After opening the data sets click the ADVANCED button in the Basic Data Tab. In the Field Name column click on the first empty field and select address. In the Similarity column select “is not similar” and in the And/Or column select “and”. Click OK then click on the Issue Rules button.
[/dt_item]
[/dt_accordion]