Changelog¶
[0.7.6] - 2020-05-20¶
Some internal changes to the continuous integration pipeline on top of version 0.7.6.
[0.7.5] - 2020-05-20¶
Moved to pyscaffold, src/test structure and GitHub Actions.
[0.7.1] - 2020-05-20¶
Added¶
you can now additionally provide constraints to SubgroupDiscovery
MinSupportConstraint added
you can now run the slow tests py passing
--runslow
to pytestConjunction
,Disjunction
and Selectors now all have the public property.selectors
that provides all basic selectors involved
Removed¶
support for weights has been removed, it will probably be added in the future as seperate targets and Quality functions.
Changed¶
create_numeric_selector_for_attribute
has been renamed tocreate_numeric_selectors_for_attribute
(inserting ans
) This brings it in lign with the corresponding name shema for nominal.
Changed internally¶
statistics are now also store along with score and description
The function
ps.get_cover_array_and_size
was added, it allows for a consistent way to acces a cover array (a.k.a. sth to be thrown into a dataframe or a numpy array)algorithm tests now also call the
to_subgroups
andto_dataframe
methods to check they work with that algorithmthe order of
calculate_statistics
andget_base_statistics
are now in lign with that of quality functions (first subgroup then data)the size of a subgroup specified in a statistics object is now called
size_sg
uniformly. This avoids confusion with thesize
attribute of numpy arrays etc.
[0.7.0] - 2020-04-24¶
This update prepares pysubgroup for a better future. To do so we had to break backwards compatibility. Many of the classes that you know and love have been renamed so as to make their purpose more clear.
Changed:¶
SubgroupDescription is now called Conjunction
NominalTarget is now called BinaryTarget
algorithms now return a SubgroupDiscoveryResult object
the structure of quality functions changed (see documentation for more info)
Added¶
pysubgroup now has a bunch of tests
some algorithms and quality functions support numba for just in time compilation
ModelTarget
gp-growth
3 types of Representations (bitset, set, numpy-set)
Refinement operator
Disjunction
New algorithms
[0.6.2.1] - 2019-20-11¶
Added¶
Apriori now has the option to disable numba using the use_numba flag
SimpleSrach now has a progressbar (enabled via the show_progress=True flag)
The number of quality function evaluations can now be tracked using the CountCallsInterestingMeasure as a wrapper
StandardQfNumeric now offers three different options to calculate the optimistic estimate
‘sum’ (default) sums the values larger then the dataset mean (cf. Lemmerich 2014 p. 81 top)
‘average’ uses the maximum target values as estimate (cf. Lemmerich 2014 p. 82 center)
‘order’ uses ordering based bounds (cf. Lemmerich 2014 p. 89 bottom)
Bugfix¶
Apriori now calculates the constant statistics before using representation
DFS now properly works with any quality function
Improvements¶
Apriori now reuses the compiled numba function
Nominal target now uses subgroup.size to access the size of a subgroup representation
StaticSpecializationOperator now avoids checking refinements of the same attribute
test_algorithms_numeric now checks more algorithms
[0.6.2] - 2019-31-10¶
Changed¶
SubgroupDescription has been replaced with Conjunction
Selector .covers function returns a numpy array instead of a pandas Series (speedup on dense data)
Conjunction .selectors is renamed to Conjunction._selectors
quality functions have a different interface
calculate_constant_statistics(self, task) caches necessary precomputation
calculate_statistics(self, subgroup, data=None) returns a namedtuple with necessary statistics
evaluate(self, subgroup, statistics=None) computes quality from provided statistics
optimistic_estimate(self, subgroup, statistics=None) computes optimistic estimate from provided statistics
Added¶
Conjunction (replaces SubgroupDescription)
Disjunction
DNF (Disjunctive Normal Form)
representations (given a dataset selectors are queried only once)
BitsetRepresentation
SetRepresentation
NumpySetRepresentation
SimpleSearch algorithm
DFS (Depth first search) using a representation for StandardQF
tests
access to datasets for testing is provided through DataSets class
tests for selector classes (NominalSelector, NumericSelector)
__eq__
__lt__
__hash__ similarity
uniqueness of selectors
cover function for NominalSelector
tests for Conjunction, Disjuntion
__eq__
__lt__
__hash__ similarity
cover
tests for algorithms with nominal target concept on the creditg dataset (StandardQF(1) + NominalSearchSpace, StandardQF(1)+Nominal&Numeric Searchspace, StandardQF(0.5)+Nominal&Numeric Searchspace)
Apriori
SimpleDFS
BeamSearch
DFS_bitset
DFS_set
DFS_numpy_sets
SimpleSearch
tests for algorithms with numeric target concept (StandardQFNumeric)
Apriori
SimpleDFS
DFSNumeric
tests for algorithm with fi target (CountQF)
Apriori
DFS
tests for algorithms to find the best Disjunctions
Apriori
Generalising BFS
Improvements¶
Apriori algorithm now runs significantly faster due to precomputing and usage of list comprehension