Tuesday, March 10, 2009

Categorical Variables

Categorical variables can often show relationships not found in continuous data.

A categorical variable is any discrete variable.

For example:

The probability of a car turning left or right, can be represented as
1 = turns left
2 = turns right

Continuous variables can also be made categorical. For the example of age we may say:
1 = Those less than 18
2 = Those 18-70
3 = Those 70 or more

These three categorical variables should be driven on a hypothesis we want to test for any of those age groups.

How the categories are defined can become an art and so it is good to be cautious when viewing results from categorical data...for example, I may run the test with my current age ranges and find no good result...then I may decide to make
1 = Those less than 24
2 = Those 24-85
3 = Those 85 or more

and find that I now have a great result. Such a change of variable definition to get a good result is not good science. Assumptions should always come first.

No comments: