TabPFNUnsupervisedModel ¶

Bases: BaseEstimator

TabPFN unsupervised model for imputation, outlier detection, and synthetic data generation.

This model combines a TabPFNClassifier for categorical features and a TabPFNRegressor for numerical features to perform various unsupervised learning tasks on tabular data.

Parameters:

Name	Type	Description	Default
`tabpfn_clf`		TabPFNClassifier, optional TabPFNClassifier instance for handling categorical features. If not provided, the model assumes that there are no categorical features in the data.	`None`
`tabpfn_reg`		TabPFNRegressor, optional TabPFNRegressor instance for handling numerical features. If not provided, the model assumes that there are no numerical features in the data.	`None`

Attributes:

Name	Type	Description
`categorical_features`		list List of indices of categorical features in the input data.

Example

>>> tabpfn_clf = TabPFNClassifier()
>>> tabpfn_reg = TabPFNRegressor()
>>> model = TabPFNUnsupervisedModel(tabpfn_clf, tabpfn_reg)
>>>
>>> X = [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]
>>> model.fit(X)
>>>
>>> X_imputed = model.impute(X)
>>> X_outliers = model.outliers(X)
>>> X_synthetic = model.generate_synthetic_data(n_samples=100)

init ¶

__init__(
    tabpfn_clf: Optional[TabPFNClassifier] = None,
    tabpfn_reg: Optional[TabPFNRegressor] = None,
) -> None

Initialize the TabPFNUnsupervisedModel.

Parameters:

Name	Type	Description	Default
`tabpfn_clf`		TabPFNClassifier, optional TabPFNClassifier instance for handling categorical features. If not provided, the model assumes that there are no categorical features in the data.	`None`
`tabpfn_reg`		TabPFNRegressor, optional TabPFNRegressor instance for handling numerical features. If not provided, the model assumes that there are no numerical features in the data.	`None`

fit ¶

fit(X: ndarray, y: Optional[ndarray] = None) -> None

Fit the model to the input data.

Parameters:

Name	Type	Description	Default
`X`		array-like of shape (n_samples, n_features) Input data to fit the model.	required
`y`		array-like of shape (n_samples,), optional Target values.	`None`

Returns:

Name	Type	Description
`self`	`None`	TabPFNUnsupervisedModel Fitted model.

set_categorical_features ¶

set_categorical_features(categorical_features)

impute ¶

impute(
    X: tensor, t: float = 1e-09, n_permutations: int = 10
) -> tensor

Impute missing values in the input data.

Parameters:

Name	Type	Description	Default
`X`		torch.Tensor of shape (n_samples, n_features) Input data with missing values encoded as np.nan.	required
`t`		float, default=0.000000001 Temperature for sampling from the imputation distribution. Lower values result in more deterministic imputations.	`1e-09`

Returns:

Type	Description
`tensor`	torch.Tensor of shape (n_samples, n_features) Imputed data with missing values replaced.

get_embeddings ¶

get_embeddings(
    X: tensor, per_column: bool = False
) -> tensor

Get the transformer embeddings for the test data X.

Parameters:

Name	Type	Description	Default
`X`	`tensor`		required

Returns:

Type	Description
`tensor`	torch.Tensor of shape (n_samples, embedding_dim)

outliers ¶

outliers(X: tensor, n_permutations: int = 10) -> tensor

Preferred implementation for outliers, where we calculate the sample probability for each sample in X by multiplying the probabilities of each feature according to chain rule of probability. The first feature is estimated by using a zero feature as input.

Args X: Samples to calculate the sample probability for, shape (n_samples, n_features)

Returns:

Type	Description
`tensor`	Sample unnormalized probability for each sample in X, shape (n_samples,)

generate_synthetic_data ¶

generate_synthetic_data(
    n_samples=100, t=1.0, n_permutations=3
)

Generate synthetic data using the trained models. Uses imputation method to generate synthetic data, passed with a matrix of nans. Samples are generated feature by feature in one pass, so samples are not dependent on each other per feature.

Parameters:

Name	Type	Description	Default
`n_samples`		int, default=100 Number of synthetic samples to generate.	`100`
`t`		float, default=1.0 Temperature for sampling from the imputation distribution. Lower values result in more deterministic samples.	`1.0`

Returns:

Type	Description
	torch.Tensor of shape (n_samples, n_features) Generated synthetic data.

TabPFNUnsupervisedModel ¶

__init__ ¶

fit ¶

set_categorical_features ¶

impute ¶

get_embeddings ¶

outliers ¶

generate_synthetic_data ¶

init ¶