Starting from Jonathan Lewis' test case "Discrete Dangers" on page 126 of his book "Cost Based Oracle",
we have investigated (and illustrated with graphs) the CBO cardinality estimation algorithm for range-based predicates:
select x from t where x > low_x and x < high_x (open , open )
select x from t where x >= low_x and x <= high_x (closed, closed)
select x from t where x > low_x and x <= high_x (open, closed)
select x from t where x >= low_x and x < high_x (closed, open)
template:
select x from t where x #low low_x and x #high high_x
#low in (">", ">=")
#high in ("<", "<=")
when column x has no associated histogram.
Letting
min_x = min (x) over all rows
max_x = max (x) over all rows
B = (max_x - min_x) / num_distinct (Band width)
and defining
left band : min_x < x < min_x + B
central region : min_x + B < x < max_x - B
right band : max_x - B < x < max_x
we have seen that
For ranges completely contained in one of the two bands
min_x <= low_x < high_x <= min_x + B
or max_x - B <= low_x < high_x <= max_x
the cardinality can depend on both low_x and high_x, only low_x, only high_x, or none (= constant), depending on the value of #low and #high, in a somewhat counterintuitive way. We have found the relevant formulae.
In all the other cases, a slightly modified (modifications underlined) version of Jonathan Lewis' "standard formula" applies:
cardinality = num_rows * (high_x_effective - low_x_effective) / (max_x - min_x)
+ num_rows * correction_for_or_equal_operators
- num_rows * correction_for_special_case
if #low = ">=" and low_x inside the left band then
low_x_effective = min_x + B
else
low_x_effective = low_x
end
if #high = "<=" and high_x inside the right band
high_x_effective = max_x - B
else
high_x_effective = high_x
end
correction_for_or_equal_operators = 0
+ 1/num_distinct if #low = ">="
+ 1/num_distinct if #high = "<="
correction_for_special_case = 0
+ 1/num_distinct if low_x = min_x and #low = ">"
+ 1/num_distinct if high_x = max_x and #high = "<"
We have also graphically illustrated, hopefully for a better intuitive understanding, the case of selection over both an infinitesimal and a finite range for the (open,open) and (closed, closed) cases.