TabPFNClassifier ¶
Bases: TabPFNBaseModel
, ClassifierMixin
__init__ ¶
__init__(
model_path: str | Path = Path(local_model_path)
/ "model_hans_classification.ckpt",
n_estimators: int = 4,
preprocess_transforms: Tuple[
PreprocessorConfig, ...
] = (
PreprocessorConfig(
"quantile_uni_coarse",
append_original=True,
categorical_name="ordinal_very_common_categories_shuffled",
global_transformer_name="svd",
subsample_features=-1,
),
PreprocessorConfig(
"none",
categorical_name="numeric",
subsample_features=-1,
),
),
feature_shift_decoder: str = "shuffle",
normalize_with_test: bool = False,
average_logits: bool = False,
optimize_metric: ClassificationOptimizationMetricType = "roc",
transformer_predict_kwargs: Optional[Dict] = None,
multiclass_decoder="shuffle",
softmax_temperature: Optional[float] = math.exp(-0.1),
use_poly_features: bool = False,
max_poly_features: int = 50,
transductive: bool = False,
remove_outliers: float = 12.0,
add_fingerprint_features: bool = True,
subsample_samples: float = -1,
model: Optional[Module] = None,
model_config: Optional[Dict] = None,
fit_at_predict_time: bool = True,
device: Literal["cuda", "cpu", "auto"] = "auto",
seed: Optional[int] = 0,
show_progress: bool = True,
batch_size_inference: int = 1,
fp16_inference: bool = True,
save_peak_memory: Literal[
"True", "False", "auto"
] = "True",
maximum_free_memory_in_gb: Optional[float] = None,
split_test_samples: float | str = 1,
)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_path |
str | Path
|
The model string is the path to the model. |
Path(local_model_path) / 'model_hans_classification.ckpt'
|
n_estimators |
int
|
The number of ensemble configurations to use, the most important setting. |
4
|
preprocess_transforms |
Tuple[PreprocessorConfig, ...]
|
A tuple of strings, specifying the preprocessing steps to use.
You can use the following strings as elements '(none|power|quantile|robust)[_all][_and_none]', where the first
part specifies the preprocessing step and the second part specifies the features to apply it to and
finally '_and_none' specifies that the original features should be added back to the features in plain.
Finally, you can combine all strings without |
(PreprocessorConfig('quantile_uni_coarse', append_original=True, categorical_name='ordinal_very_common_categories_shuffled', global_transformer_name='svd', subsample_features=-1), PreprocessorConfig('none', categorical_name='numeric', subsample_features=-1))
|
feature_shift_decoder |
str
|
["shuffle", "none", "local_shuffle", "rotate", "auto_rotate"] Whether to shift features for each ensemble configuration. |
'shuffle'
|
normalize_with_test |
bool
|
If True, the test set is used to normalize the data, otherwise the training set is used only. |
False
|
average_logits |
bool
|
Whether to average logits or probabilities for ensemble members. |
False
|
optimize_metric |
ClassificationOptimizationMetricType
|
The optimization metric to use. |
'roc'
|
transformer_predict_kwargs |
Optional[Dict]
|
Additional keyword arguments to pass to the transformer predict method. |
None
|
multiclass_decoder |
The multiclass decoder to use. |
'shuffle'
|
|
softmax_temperature |
Optional[float]
|
A log spaced temperature, it will be applied as logits <- logits/softmax_temperature. |
exp(-0.1)
|
use_poly_features |
bool
|
Whether to use polynomial features as the last preprocessing step. |
False
|
max_poly_features |
int
|
Maximum number of polynomial features to use, None means unlimited. |
50
|
transductive |
bool
|
Whether to use transductive learning. |
False
|
remove_outliers |
float
|
If not 0.0, will remove outliers from the input features, where values with a standard deviation larger than remove_outliers will be removed. |
12.0
|
add_fingerprint_features |
bool
|
If True, will add one feature of random values, that will be added to the input features. This helps discern duplicated samples in the transformer model. |
True
|
subsample_samples |
float
|
If not None, will use a random subset of the samples for training in each ensemble configuration. If 1 or above, this will subsample to the specified number of samples. If in 0 to 1, the value is viewed as a fraction of the training set size. |
-1
|
model |
Optional[Module]
|
The model, if you want to specify it directly, this is used in combination with model_config. |
None
|
model_config |
Optional[Dict]
|
The config, if you want to specify it directly, this is used in combination with model. |
None
|
fit_at_predict_time |
bool
|
Whether to train the model lazily, i.e. only when it is needed for inference in predict[_proba]. |
True
|
device |
Literal['cuda', 'cpu', 'auto']
|
The device to use for inference, "auto" means that it will use cuda if available, otherwise cpu. |
'auto'
|
seed |
Optional[int]
|
The default seed to use for the order of the ensemble configurations, a seed of None will not. |
0
|
show_progress |
bool
|
Whether to show progress bars during training and inference. |
True
|
batch_size_inference |
int
|
The batch size to use for inference, this does not affect the results, just the
memory usage and speed. A higher batch size is faster but uses more memory. Setting the batch size to None
means that the batch size is automatically determined based on the memory usage and the maximum free memory
specified with |
1
|
fp16_inference |
bool
|
Whether to use fp16 for inference on GPU, does not affect CPU inference. |
True
|
save_peak_memory |
Literal['True', 'False', 'auto']
|
Whether to save the peak memory usage of the model, can enable up to 8 times larger datasets to fit into memory. "True", means always enabled, "False", means always disabled, "auto" means that it will be set based on the memory usage. |
'True'
|
fit ¶
fit(
X: Union[ndarray, Tensor],
y: Union[ndarray, Tensor],
additional_y: Optional[Dict[str, Tensor]] = None,
) -> TabPFNClassifier
Fits the TabPFNClassifier model to the input data X
and y
.
The actual training logic is delegated to the _fit
method, which should be implemented by subclasses.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
Union[ndarray, Tensor]
|
The input feature matrix of shape (n_samples, n_features). |
required |
y |
Union[ndarray, Tensor]
|
The target labels of shape (n_samples,). |
required |
additional_y |
Optional[Dict[str, Tensor]]
|
Additional labels to use during training. |
None
|
Returns:
Name | Type | Description |
---|---|---|
TabPFNClassifier |
TabPFNClassifier
|
The fitted model object (self). |
predict ¶
Predict the class labels for the input samples.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
array - like
|
The input samples. |
required |
return_winning_probability |
bool
|
Whether to return the winning probability. |
False
|
Returns:
Name | Type | Description |
---|---|---|
array |
The predicted class labels. |
predict_proba ¶
Calls the transformer to predict the probabilities of the classes of the X test inputs given the previous set training dataset
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
test datapoints |
required |
predict_y_proba ¶
Predict the probability of the target labels y
given the input samples X
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
array - like
|
The input samples. |
required |
y |
array - like
|
The target labels. |
required |
Returns:
Name | Type | Description |
---|---|---|
array |
The predicted probabilities of the target labels. |
set_categorical_features ¶
Set the categorical features to use for the model.
These categorical features might be overridden by the preprocessing steps.
This is controlled by
i) max_unique_values_as_categorical_feature
, the maximum number of unique values
a feature can have to be considered a categorical feature. Features with more unique values
are considered numerical features.
ii) min_unique_values_as_numerical_feature
the minimum number of unique values
a feature can have to be considered a numerical feature. Features with less unique values
are considered categorical features.
:param categorical_features: The feature indices of the categorical features
score ¶
Compute the score of the model on the given test data and labels.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
array - like
|
The input samples. |
required |
y |
array - like
|
The true labels for |
required |
sample_weight |
array - like
|
Sample weights. |
None
|
Returns:
Name | Type | Description |
---|---|---|
float |
The computed score. |
estimate_memory_usage ¶
estimate_memory_usage(
X: ndarray | tensor,
unit: Literal["b", "mb", "gb"] = "gb",
eval_position: int = -1,
**overwrite_params
) -> float | None
Estimates the memory usage of the model.
Peak memory usage is accurate for ´save_peak_mem_factor´ in O(n_feats, n_samples) on average but with significant outliers (2x). Also this calculation does not include baseline usage and constant offsets. Baseline memory usage can be ignored if we set the maximum memory usage to the default None which uses the free memory of the system. The constant offsets are not significant for large datasets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
ndarray
|
The feature matrix. X should represent the concat of train and test in if
|
required |
unit |
Literal['b', 'mb', 'gb']
|
The unit to return the memory usage in (bytes, megabytes, or gigabytes). |
'gb'
|
Returns:
Name | Type | Description |
---|---|---|
int |
float | None
|
The estimated memory usage in bytes. |
estimate_computation_usage ¶
estimate_computation_usage(
X: ndarray,
unit: Literal[
"sequential_flops", "s"
] = "sequential_flops",
eval_position: int = -1,
**overwrite_params
) -> float | None
Estimates the sequential computation usage of the model. Those are the operations that are not parallelizable and are the main bottleneck for the computation time.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
ndarray
|
The feature matrix. X should represent the concat of train and test in if |
required |
unit |
str
|
The unit to return the computation usage in. |
'sequential_flops'
|
Returns:
Name | Type | Description |
---|---|---|
int |
float | None
|
The estimated computation usage in unit of choice. |