Changelog¶

[0.7.6] - 2020-05-20¶

Some internal changes to the continuous integration pipeline on top of version 0.7.6.

[0.7.5] - 2020-05-20¶

Moved to pyscaffold, src/test structure and GitHub Actions.

[0.7.1] - 2020-05-20¶

Added¶

you can now additionally provide constraints to SubgroupDiscovery
- MinSupportConstraint added
you can now run the slow tests py passing --runslow to pytest
Conjunction, Disjunction and Selectors now all have the public property .selectors that provides all basic selectors involved

Removed¶

support for weights has been removed, it will probably be added in the future as seperate targets and Quality functions.

Changed¶

create_numeric_selector_for_attribute has been renamed to create_numeric_selectors_for_attribute (inserting an s) This brings it in lign with the corresponding name shema for nominal.

Changed internally¶

statistics are now also store along with score and description
The function ps.get_cover_array_and_size was added, it allows for a consistent way to acces a cover array (a.k.a. sth to be thrown into a dataframe or a numpy array)
algorithm tests now also call the to_subgroups and to_dataframe methods to check they work with that algorithm
the order of calculate_statistics and get_base_statistics are now in lign with that of quality functions (first subgroup then data)
the size of a subgroup specified in a statistics object is now called size_sg uniformly. This avoids confusion with the size attribute of numpy arrays etc.

[0.7.0] - 2020-04-24¶

This update prepares pysubgroup for a better future. To do so we had to break backwards compatibility. Many of the classes that you know and love have been renamed so as to make their purpose more clear.

Changed:¶

SubgroupDescription is now called Conjunction
NominalTarget is now called BinaryTarget
algorithms now return a SubgroupDiscoveryResult object
the structure of quality functions changed (see documentation for more info)

Added¶

pysubgroup now has a bunch of tests
some algorithms and quality functions support numba for just in time compilation
ModelTarget
gp-growth
3 types of Representations (bitset, set, numpy-set)
Refinement operator
Disjunction
New algorithms

[0.6.2.1] - 2019-20-11¶

Added¶

Apriori now has the option to disable numba using the use_numba flag
SimpleSrach now has a progressbar (enabled via the show_progress=True flag)
The number of quality function evaluations can now be tracked using the CountCallsInterestingMeasure as a wrapper
StandardQfNumeric now offers three different options to calculate the optimistic estimate
- ‘sum’ (default) sums the values larger then the dataset mean (cf. Lemmerich 2014 p. 81 top)
- ‘average’ uses the maximum target values as estimate (cf. Lemmerich 2014 p. 82 center)
- ‘order’ uses ordering based bounds (cf. Lemmerich 2014 p. 89 bottom)

Bugfix¶

Apriori now calculates the constant statistics before using representation
DFS now properly works with any quality function

Improvements¶

Apriori now reuses the compiled numba function
Nominal target now uses subgroup.size to access the size of a subgroup representation
StaticSpecializationOperator now avoids checking refinements of the same attribute
test_algorithms_numeric now checks more algorithms

[0.6.2] - 2019-31-10¶

Changed¶

SubgroupDescription has been replaced with Conjunction
Selector .covers function returns a numpy array instead of a pandas Series (speedup on dense data)
Conjunction .selectors is renamed to Conjunction._selectors
quality functions have a different interface
- calculate_constant_statistics(self, task) caches necessary precomputation
- calculate_statistics(self, subgroup, data=None) returns a namedtuple with necessary statistics
- evaluate(self, subgroup, statistics=None) computes quality from provided statistics
- optimistic_estimate(self, subgroup, statistics=None) computes optimistic estimate from provided statistics

Added¶

Conjunction (replaces SubgroupDescription)
Disjunction
DNF (Disjunctive Normal Form)
representations (given a dataset selectors are queried only once)
- BitsetRepresentation
- SetRepresentation
- NumpySetRepresentation
SimpleSearch algorithm
DFS (Depth first search) using a representation for StandardQF
tests
- access to datasets for testing is provided through DataSets class
- tests for selector classes (NominalSelector, NumericSelector)
  - __eq__
  - __lt__
  - __hash__ similarity
  - uniqueness of selectors
  - cover function for NominalSelector
- tests for Conjunction, Disjuntion
  - __eq__
  - __lt__
  - __hash__ similarity
  - cover
- tests for algorithms with nominal target concept on the creditg dataset (StandardQF(1) + NominalSearchSpace, StandardQF(1)+Nominal&Numeric Searchspace, StandardQF(0.5)+Nominal&Numeric Searchspace)
  - Apriori
  - SimpleDFS
  - BeamSearch
  - DFS_bitset
  - DFS_set
  - DFS_numpy_sets
  - SimpleSearch
- tests for algorithms with numeric target concept (StandardQFNumeric)
  - Apriori
  - SimpleDFS
  - DFSNumeric
- tests for algorithm with fi target (CountQF)
  - Apriori
  - DFS
- tests for algorithms to find the best Disjunctions
  - Apriori
  - Generalising BFS

Improvements¶

Apriori algorithm now runs significantly faster due to precomputing and usage of list comprehension

Changelog¶

[0.7.6] - 2020-05-20¶

[0.7.5] - 2020-05-20¶

[0.7.1] - 2020-05-20¶

Added¶

Removed¶

Changed¶

Changed internally¶

[0.7.0] - 2020-04-24¶

Changed:¶

Added¶

[0.6.2.1] - 2019-20-11¶

Added¶

Bugfix¶

Improvements¶

[0.6.2] - 2019-31-10¶

Changed¶

Added¶

Improvements¶

pysubgroup

Navigation

Related Topics