pysubgroup package¶
Submodules¶
pysubgroup.algorithms module¶
Created on 29.04.2016
@author: lemmerfn
- class pysubgroup.algorithms.Apriori(representation_type=None, combination_name='Conjunction', use_numba=True)[source]¶
Bases:
object
- class pysubgroup.algorithms.BeamSearch(beam_width=20, beam_width_adaptive=False)[source]¶
Bases:
object
Implements the BeamSearch algorithm. Its a basic implementation
- class pysubgroup.algorithms.DFS(apply_representation)[source]¶
Bases:
object
Implementation of a depth-first-search with look-ahead using a provided datastructure.
pysubgroup.binary_target module¶
Created on 29.09.2017
@author: lemmerfn
- class pysubgroup.binary_target.BinaryTarget(target_attribute=None, target_value=None, target_selector=None)[source]¶
Bases:
BaseTarget
- statistic_types = ('size_sg', 'size_dataset', 'positives_sg', 'positives_dataset', 'size_complement', 'relative_size_sg', 'relative_size_complement', 'coverage_sg', 'coverage_complement', 'target_share_sg', 'target_share_complement', 'target_share_dataset', 'lift')¶
- class pysubgroup.binary_target.ChiSquaredQF(direction='both', min_instances=5, stat='chi2')[source]¶
Bases:
SimplePositivesQF
ChiSquaredQF which test for statistical independence of a subgroup against it’s complement
…
- static chi_squared_qf(instances_dataset, positives_dataset, instances_subgroup, positives_subgroup, min_instances=5, bidirect=True, direction_positive=True, index=0)[source]¶
Performs chi2 test of statistical independence
Test whether a subgroup is statistically independent from it’s complement (see scipy.stats.chi2_contingency).
- Parameters:
instances_dataset –
- positives_dataset,
instances_subgroup, positives_subgroup : int
counts of subgroup and dataset
- :parampositives_dataset,
instances_subgroup, positives_subgroup : int
counts of subgroup and dataset
- Parameters:
min_instances (int, optional) – number of required instances, if less -inf is returned for that subgroup
bidirect (bool, optional) – If true both directions are considered interesting else direction_positive decides which direction is interesting
direction_positive (bool, optional) – Only used if bidirect=False; specifies whether you are interested in positive (True) or negative deviations
index ({0, 1}, optional) – decides whether the test statistic (0) or the p-value (1) should be used
- class pysubgroup.binary_target.GeneralizationAware_StandardQF(a)[source]¶
Bases:
GeneralizationAwareQF_stats
- class pysubgroup.binary_target.LiftQF[source]¶
Bases:
StandardQF
Lift Quality Function
LiftQF is a StandardQF with a=0. Thus it treats the difference in ratios as the quality without caring about the relative size of a subgroup.
- class pysubgroup.binary_target.SimpleBinomialQF[source]¶
Bases:
StandardQF
Simple Binomial Quality Function
SimpleBinomialQF is a StandardQF with a=0.5. It is an order equivalent approximation of the full binomial test if the subgroup size is much smaller than the size of the entire dataset.
- class pysubgroup.binary_target.SimplePositivesQF[source]¶
Bases:
AbstractInterestingnessMeasure
- property gp_requires_cover_arr¶
- tpl¶
alias of
PositivesQF_parameters
- class pysubgroup.binary_target.StandardQF(a)[source]¶
Bases:
SimplePositivesQF
,BoundedInterestingnessMeasure
StandardQF which weights the relative size against the difference in averages
The StandardQF is a general form of quality function which for different values of a is order equivalen to many popular quality measures.
- class pysubgroup.binary_target.WRAccQF[source]¶
Bases:
StandardQF
Weighted Relative Accuracy Quality Function
WRAccQF is a StandardQF with a=1. It is order equivalent to the difference in the observed and expected number of positive instances.
pysubgroup.constraints module¶
pysubgroup.datasets module¶
pysubgroup.fi_target module¶
Created on 29.09.2017
@author: lemmerfn
- class pysubgroup.fi_target.AreaQF[source]¶
Bases:
SimpleCountQF
- class pysubgroup.fi_target.FITarget[source]¶
Bases:
BaseTarget
- statistic_types = ('size_sg', 'size_dataset')¶
- class pysubgroup.fi_target.SimpleCountQF[source]¶
Bases:
AbstractInterestingnessMeasure
- gp_requires_cover_arr = False¶
- tpl¶
alias of
CountQF_parameters
pysubgroup.gp_growth module¶
pysubgroup.measures module¶
Created on 28.04.2016
@author: lemmerfn
- class pysubgroup.measures.CombinedInterestingnessMeasure(measures, weights=None)[source]¶
- class pysubgroup.measures.GeneralizationAwareQF(qf)[source]¶
- class pysubgroup.measures.GeneralizationAwareQF_stats(qf)[source]¶
Bases:
AbstractInterestingnessMeasure
- ga_tuple¶
alias of
ga_stats_tuple
pysubgroup.model_target module¶
- class pysubgroup.model_target.EMM_Likelihood(model)[source]¶
Bases:
AbstractInterestingnessMeasure
- property gp_requires_cover_arr¶
- tpl¶
alias of
EMM_Likelihood
pysubgroup.numeric_target module¶
Created on 29.09.2017
@author: lemmerfn
- class pysubgroup.numeric_target.NumericTarget(target_variable)[source]¶
Bases:
object
- statistic_types = ('size_sg', 'size_dataset', 'mean_sg', 'mean_dataset', 'std_sg', 'std_dataset', 'median_sg', 'median_dataset', 'max_sg', 'max_dataset', 'min_sg', 'min_dataset', 'mean_lift', 'median_lift')¶
- class pysubgroup.numeric_target.StandardQFNumeric(a, invert=False, estimator='sum')[source]¶
Bases:
BoundedInterestingnessMeasure
- tpl¶
alias of
StandardQFNumeric_parameters
- class pysubgroup.numeric_target.StandardQFNumericMedian(a, invert=False, estimator='sum')[source]¶
Bases:
BoundedInterestingnessMeasure
- tpl¶
alias of
StandardQFNumericMedian_parameters
- class pysubgroup.numeric_target.StandardQFNumericTscore(a, invert=False, estimator='sum')[source]¶
Bases:
BoundedInterestingnessMeasure
- tpl¶
alias of
StandardQFNumericTscore_parameters
pysubgroup.refinement_operator module¶
pysubgroup.representations module¶
- class pysubgroup.representations.BitSetRepresentation(df, selectors_to_patch)[source]¶
Bases:
RepresentationBase
- Conjunction¶
alias of
BitSet_Conjunction
- Disjunction¶
alias of
BitSet_Disjunction
- class pysubgroup.representations.BitSet_Conjunction(*args, **kwargs)[source]¶
Bases:
Conjunction
- n_instances = 0¶
- property size_sg¶
- class pysubgroup.representations.BitSet_Disjunction(*args, **kwargs)[source]¶
Bases:
Disjunction
- property size_sg¶
- class pysubgroup.representations.NumpySetRepresentation(df, selectors_to_patch)[source]¶
Bases:
RepresentationBase
- Conjunction¶
alias of
NumpySet_Conjunction
- class pysubgroup.representations.NumpySet_Conjunction(*args, **kwargs)[source]¶
Bases:
Conjunction
- all_set = None¶
- property size_sg¶
- class pysubgroup.representations.RepresentationBase(new_conjunction, selectors_to_patch)[source]¶
Bases:
object
- class pysubgroup.representations.SetRepresentation(df, selectors_to_patch)[source]¶
Bases:
RepresentationBase
- Conjunction¶
alias of
Set_Conjunction
pysubgroup.subgroup_description module¶
Created on 28.04.2016
@author: lemmerfn
- class pysubgroup.subgroup_description.Conjunction(selectors)[source]¶
Bases:
BooleanExpressionBase
- property depth¶
- property selectors¶
- class pysubgroup.subgroup_description.DNF(selectors=None)[source]¶
Bases:
Disjunction
- class pysubgroup.subgroup_description.Disjunction(selectors=None)[source]¶
Bases:
BooleanExpressionBase
- property selectors¶
- class pysubgroup.subgroup_description.EqualitySelector(*args, **kwargs)[source]¶
Bases:
SelectorBase
- property attribute_name¶
- property attribute_value¶
- property selectors¶
- class pysubgroup.subgroup_description.IntervalSelector(*args, **kwargs)[source]¶
Bases:
SelectorBase
- property attribute_name¶
- classmethod compute_descriptions(attribute_name, lower_bound, upper_bound, selector_name=None)[source]¶
- property lower_bound¶
- property selectors¶
- property upper_bound¶
- class pysubgroup.subgroup_description.NegatedSelector(*args, **kwargs)[source]¶
Bases:
SelectorBase
- property attribute_name¶
- property selectors¶
- pysubgroup.subgroup_description.create_nominal_selectors_for_attribute(data, attribute_name, dtypes=None)[source]¶
- pysubgroup.subgroup_description.create_numeric_selectors(data, nbins=5, intervals_only=True, weighting_attribute=None, ignore=None)[source]¶
- pysubgroup.subgroup_description.create_numeric_selectors_for_attribute(data, attr_name, nbins=5, intervals_only=True, weighting_attribute=None)[source]¶
- pysubgroup.subgroup_description.create_selectors(data, nbins=5, intervals_only=True, ignore=None)[source]¶
pysubgroup.utils module¶
Created on 02.05.2016
@author: lemmerfn
- pysubgroup.utils.add_if_required(result, sg, quality, task: SubgroupDiscoveryTask, check_for_duplicates=False, statistics=None, explicit_result_set_size=None)[source]¶
Important
Only add/remove subgroups from result by using heappop and heappush to ensure order of subgroups by quality.
pysubgroup.visualization module¶
- pysubgroup.visualization.plot_roc(result_df, data, qf=<pysubgroup.binary_target.StandardQF object>, levels=40, annotate=False)[source]¶