Skip to content

TabPFNRegressor

Bases: TabPFNBaseModel, RegressorMixin

__init__

__init__(
    model_path: str | Path = str(
        Path(local_model_path).resolve()
        / "model_hans_regression.ckpt"
    ),
    n_estimators: int = 8,
    preprocess_transforms: Tuple[
        PreprocessorConfig, ...
    ] = (
        PreprocessorConfig(
            "quantile_uni",
            append_original=True,
            categorical_name="ordinal_very_common_categories_shuffled",
            global_transformer_name="svd",
        ),
        PreprocessorConfig(
            "safepower", categorical_name="onehot"
        ),
    ),
    feature_shift_decoder: str = "shuffle",
    normalize_with_test: bool = False,
    average_logits: bool = False,
    optimize_metric: RegressionOptimizationMetricType = "rmse",
    transformer_predict_kwargs: Optional[Dict] = None,
    softmax_temperature: Optional[float] = math.exp(-0.1),
    use_poly_features: bool = False,
    max_poly_features: int = 50,
    transductive: bool = False,
    remove_outliers=-1,
    regression_y_preprocess_transforms: Optional[
        Tuple[
            None
            | Literal[
                "safepower", "power", "quantile_norm"
            ],
            ...,
        ]
    ] = (None, "safepower"),
    add_fingerprint_features: bool = True,
    cancel_nan_borders: bool = True,
    super_bar_dist_averaging: bool = False,
    subsample_samples: float = -1,
    model: Optional[Module] = None,
    model_config: Optional[Dict] = None,
    fit_at_predict_time: bool = True,
    device: Literal["cuda", "cpu", "auto"] = "auto",
    seed: Optional[int] = 0,
    show_progress: bool = True,
    batch_size_inference: int = 1,
    fp16_inference: bool = True,
    save_peak_memory: Literal[
        "True", "False", "auto"
    ] = "True",
    maximum_free_memory_in_gb: Optional[float] = None,
    split_test_samples: float | str = 1,
)

Parameters:

Name Type Description Default
model_path str | Path

The model string is the path to the model.

str(resolve() / 'model_hans_regression.ckpt')
n_estimators int

The number of ensemble configurations to use, the most important setting.

8
preprocess_transforms Tuple[PreprocessorConfig, ...]

A tuple of strings, specifying the preprocessing steps to use. You can use the following strings as elements '(none|power|quantile_norm|quantile_uni|quantile_uni_coarse|robust...)[_all][_and_none]', where the first part specifies the preprocessing step (see .preprocessing.ReshapeFeatureDistributionsStep.get_all_preprocessors()) and the second part specifies the features to apply it to and finally '_and_none' specifies that the original features should be added back to the features in plain. Finally, you can combine all strings without _all with _onehot to apply one-hot encoding to the categorical features specified with self.fit(..., categorical_features=...).

(PreprocessorConfig('quantile_uni', append_original=True, categorical_name='ordinal_very_common_categories_shuffled', global_transformer_name='svd'), PreprocessorConfig('safepower', categorical_name='onehot'))
feature_shift_decoder str

["shuffle", "none", "local_shuffle", "rotate", "auto_rotate"] Whether to shift features for each ensemble configuration.

'shuffle'
normalize_with_test bool

If True, the test set is used to normalize the data, otherwise the training set is used only.

False
average_logits bool

Whether to average logits or probabilities for ensemble members.

False
optimize_metric RegressionOptimizationMetricType

The optimization metric to use.

'rmse'
transformer_predict_kwargs Optional[Dict]

Additional keyword arguments to pass to the transformer predict method.

None
softmax_temperature Optional[float]

A log spaced temperature, it will be applied as logits <- logits/softmax_temperature.

exp(-0.1)
use_poly_features bool

Whether to use polynomial features as the last preprocessing step.

False
max_poly_features int

Maximum number of polynomial features to use, None means unlimited.

50
transductive bool

Whether to use transductive learning.

False
remove_outliers

If not 0.0, will remove outliers from the input features, where values with a standard deviation larger than remove_outliers will be removed.

-1
regression_y_preprocess_transforms Optional[Tuple[None | Literal['safepower', 'power', 'quantile_norm'], ...]]

Preprocessing transforms for the target variable. This can be one from .preprocessing.ReshapeFeatureDistributionsStep.get_all_preprocessors(), e.g. "power". This can also be None to not transform the targets, beside a simple mean/variance normalization.

(None, 'safepower')
add_fingerprint_features bool

If True, will add one feature of random values, that will be added to the input features. This helps discern duplicated samples in the transformer model.

True
cancel_nan_borders bool

Whether to ignore buckets that are tranformed to nan values by inverting a regression_y_preprocess_transform. This should be set to True, only set this to False if you know what you are doing.

True
super_bar_dist_averaging bool

If we use regression_y_preprocess_transforms we need to average the predictions over the different configurations. The different configurations all come with different bar_distributions (Riemann distributions), though. The default is for us to aggregate all bar distributions using simply scaled borders in the bar distribution, scaled by the mean and std of the target variable. If you set this to True, a new bar distribution will be built using all the borders generated in the different configurations.

False
subsample_samples float

If not None, will use a random subset of the samples for training in each ensemble configuration. If 1 or above, this will subsample to the specified number of samples. If in 0 to 1, the value is viewed as a fraction of the training set size.

-1
model Optional[Module]

The model, if you want to specify it directly, this is used in combination with model_config.

None
model_config Optional[Dict]

The config, if you want to specify it directly, this is used in combination with model.

None
fit_at_predict_time bool

Whether to train the model lazily, i.e. only when it is needed for inference in predict[_proba].

True
device Literal['cuda', 'cpu', 'auto']

The device to use for inference, "auto" means that it will use cuda if available, otherwise cpu.

'auto'
seed Optional[int]

The default seed to use for the order of the ensemble configurations, a seed of None will not.

0
show_progress bool

Whether to show progress bars during training and inference.

True
batch_size_inference int

The batch size to use for inference, this does not affect the results, just the memory usage and speed. A higher batch size is faster but uses more memory. Setting the batch size to None means that the batch size is automatically determined based on the memory usage and the maximum free memory specified with maximum_free_memory_in_gb.

1
fp16_inference bool

Whether to use fp16 for inference on GPU, does not affect CPU inference.

True
save_peak_memory Literal['True', 'False', 'auto']

Whether to save the peak memory usage of the model, can enable up to 8 times larger datasets to fit into memory. "True", means always enabled, "False", means always disabled, "auto" means that it will be set based on the memory usage.

'True'

fit

fit(X, y, additional_y=None) -> TabPFNBaseModel

Fits the model to the input data X and y.

The actual training logic is delegated to the _fit method, which should be implemented by subclasses.

Parameters:

Name Type Description Default
X Union[ndarray, Tensor]

The input feature matrix of shape (n_samples, n_features).

required
y Union[ndarray, Tensor]

The target labels of shape (n_samples,).

required
additional_y Optional[Dict[str, Tensor]]

Additional labels to use during training.

None

Returns:

Name Type Description
TabPFNBaseModel TabPFNBaseModel

The fitted model object (self).

predict

predict(X, additional_y=None) -> ndarray

predict_y_proba

predict_y_proba(
    X: ndarray | Tensor, y: ndarray | Tensor
) -> ndarray

Predicts the probability of the target y given the input X.

set_categorical_features

set_categorical_features(categorical_features: List[int])

Set the categorical features to use for the model.

These categorical features might be overridden by the preprocessing steps. This is controlled by i) max_unique_values_as_categorical_feature, the maximum number of unique values a feature can have to be considered a categorical feature. Features with more unique values are considered numerical features. ii) min_unique_values_as_numerical_feature the minimum number of unique values a feature can have to be considered a numerical feature. Features with less unique values are considered categorical features.

:param categorical_features: The feature indices of the categorical features

score

score(X, y, sample_weight=None)

estimate_memory_usage

estimate_memory_usage(
    X: ndarray | tensor,
    unit: Literal["b", "mb", "gb"] = "gb",
    eval_position: int = -1,
    **overwrite_params
) -> float | None

Estimates the memory usage of the model.

Peak memory usage is accurate for ´save_peak_mem_factor´ in O(n_feats, n_samples) on average but with significant outliers (2x). Also this calculation does not include baseline usage and constant offsets. Baseline memory usage can be ignored if we set the maximum memory usage to the default None which uses the free memory of the system. The constant offsets are not significant for large datasets.

Parameters:

Name Type Description Default
X ndarray

The feature matrix. X should represent the concat of train and test in if self.fit_at_predict_time and train only otherwise. If you add a batch dimension at position 1 to the table this is used as the batch size used during inference, otherwise this depends on the batch_size_inference and n_estimators.

required
unit Literal['b', 'mb', 'gb']

The unit to return the memory usage in (bytes, megabytes, or gigabytes).

'gb'

Returns:

Name Type Description
int float | None

The estimated memory usage in bytes.

estimate_computation_usage

estimate_computation_usage(
    X: ndarray,
    unit: Literal[
        "sequential_flops", "s"
    ] = "sequential_flops",
    eval_position: int = -1,
    **overwrite_params
) -> float | None

Estimates the sequential computation usage of the model. Those are the operations that are not parallelizable and are the main bottleneck for the computation time.

Parameters:

Name Type Description Default
X ndarray

The feature matrix. X should represent the concat of train and test in if

required
unit str

The unit to return the computation usage in.

'sequential_flops'

Returns:

Name Type Description
int float | None

The estimated computation usage in unit of choice.