feets package

Subpackages

Submodules

feets.core module

Core functionalities of feets.

class feets.core.FeatureSpace(data=None, only=None, exclude=None, dask_options=None, **kwargs)[source]

Bases: object

Class to select and extract features from a time series.

The FeatureSpace class allows for the extraction of selected features from the available data vectors (e.g., magnitude, time, error, second magnitude) of one or more time series.

The data, only, and exclude filters can be combined to control the selection of features to be extracted. If no filter is provided, the selection will include all the available features.

Parameters:
dataarray_like, optional

List of available data vectors to extract from. If provided, only the features that can be computed on some of the selected vectors will be included.

onlyarray_like, optional

List of features to be extracted. If provided, only the selected features will be included. It must be disjoint with exclude.

excludearray_like, optional

List of features to be excluded from the extraction. If provided, all features except the selected ones will be included. It must be disjoint with only.

**kwargs

Additional parameters used to initialize the extractors.

Attributes:
featuresfrozenset

The features selected for extraction, based on the provided filters.

extractorsnp.ndarray

np.ndarray: The extractor instances used to compute the features.

required_datafrozenset

frozenset: The data vectors required for the extraction.

dask_optionsdict

Options to be passed to the Dask scheduler.

Raises:
ValueError

If an invalid combination of data, only, and exclude is provided.

See also

feets.Features

Class to manage and manipulate feature extraction results.

feets.Extractor

Abstract base class for feature extractors.

dask.compute

Compute several dask collections at once.

Examples

Using data filter to specify the available data vectors:

>>> fs = FeatureSpace(data=['magnitude', 'time'])
>>> # The resulting `FeatureSpace` will only extract the features that
>>> # depend on 'magnitude' and/or 'time'.
>>> fs.extract(**lc)
<Features feature_names={'Mean', 'Std', 'PeriodLS', 'Signature', ...}, length=1>

Using only filter to select specific features for extraction:

>>> fs = FeatureSpace(only=['Mean', 'Std'])
>>> # The resulting `FeatureSpace` will only extract the 'Mean' and 'Std'
>>> # features, regardless of the available data vectors.
>>> fs.extract(**lc)
<Features feature_names={'Mean', 'Std'}, length=1>

Using exclude filter to exclude specific features from extraction:

>>> fs = FeatureSpace(exclude=['Mean', 'Std'])
>>> # The resulting `FeatureSpace` will extract all features except for
>>> # 'Mean' and 'Std', regardless of the available data vectors.
>>> fs.extract(**lc)
<Features feature_names={'PeriodLS', 'Signature', ...}, length=1>

Configuring the extractors with additional parameters:

>>> fs = FeatureSpace(
...     data=['magnitude', 'time'],
...     PeriodLS={'nperiods': 5},
...     Signature={'phase_bins': 20, 'mag_bins': 15}
... )
>>> # The resulting `FeatureSpace` will extract features that depend on
>>> # 'magnitude' and 'time', with the specified parameters for the
>>> # `PeriodLS` and `Signature` extractors.
>>> fs.extract(**lc)
<Features feature_names={'Mean', 'Std', 'PeriodLS', 'Signature', ...}, length=1>
extract(**lc)[source]

Extract the selected features from the provided light curve.

Parameters:
**lcdict

A light curve represented as a dictionary, mapping data vector names to their values.

Returns:
Features

A collection of extracted features of the provided light curves.

See also

feets.Features

Class to manage and manipulate feature extraction results.

extract_many

Examples

>>> fs = FeatureSpace(only=['Mean'])
>>> fs.extract(magnitude=[1, 2, 3])
Features(feature_names={'Mean'}, length=1)
extract_many(*lcs)[source]

Extract the selected features from the provided light curves.

Parameters:
*lcslist of dict

A list of light curves, where each light curve is a dictionary mapping data vector names to their values.

Returns:
Features

A collection of extracted features of the provided light curves.

See also

feets.Features

Class to manage and manipulate feature extraction results.

extract

Examples

>>> fs = FeatureSpace(only=['Mean'])
>>> fs.extract_many({'magnitude': [1, 2, 3]}, {'magnitude': [4, 5, 6]})
Features(feature_names={'Mean'}, length=2)
property extractors

np.ndarray: The extractor instances used to compute the features.

The extractors are ordered according to their dependencies, meaning that the extractors that depend on others come after those they depend on.

classmethod from_dict(data)[source]

Create a FeatureSpace object from a dictionary representation.

Parameters:
datadict

A dictionary representation of the FeatureSpace, including the data vectors required for extraction, the selected features, the list of extractors with their parameters, and the Dask options.

Returns:
FeatureSpace

A FeatureSpace object configured with the features, required data vectors, dask options, and extractors from the provided dictionary.

See also

to_dict
classmethod from_lightcurve(**lc)[source]

Create a FeatureSpace for the provided light curve.

The resulting FeatureSpace will be configured to extract only the features that can be computed from the data vectors present in the provided light curve.

Parameters:
**lcdict

A light curve represented as a dictionary, mapping data vector names to their values.

Returns:
FeatureSpace

A FeatureSpace instance configured for the provided light curve.

See also

from_lightcurves

Examples

>>> lc = {'magnitude': [1, 2, 3]}
>>> fs = FeatureSpace.from_lightcurve(**lc)
>>>
>>> # The resulting `FeatureSpace` will only extract features that
>>> # depend on 'magnitude'.
>>> fs.extract(**lc)
Features(feature_names={'Mean', 'Std', ...}, length=1)
classmethod from_lightcurves(*lcs)[source]

Create a FeatureSpace for the provided light curves.

This method determines the common data vectors (e.g., ‘magnitude’, ‘time’) present across all provided light curves. It then creates a FeatureSpace configured to extract only the features that can be computed from this common set of data vectors.

Parameters:
*lcslist of dict

A list of light curves, where each light curve is a dictionary mapping data vector names to their values.

Returns:
FeatureSpace

A FeatureSpace instance configured for the common data vectors.

Raises:
ValueError

If no common data vectors are found among the light curves.

See also

from_lightcurve

Examples

>>> lc1 = {'magnitude': [1, 2, 3]}
>>> lc2 = {'time': [0.1, 0.2, 0.3], 'magnitude': [4, 5, 6]}
>>>
>>> # The common data vector is 'magnitude'.
>>> fs = FeatureSpace.from_lightcurves(lc1, lc2)
>>>
>>> # The resulting `FeatureSpace` will only extract features that
>>> # depend on 'magnitude'.
>>> fs.extract(**lc1)
Features(feature_names={'Mean', 'Std', ...}, length=1)
property required_data

frozenset: The data vectors required for the extraction.

property selected_features

frozenset: The features selected for extraction.

to_dict()[source]

Convert the FeatureSpace object to a dictionary representation.

Returns:
dict

A dictionary representation of the FeatureSpace, including the data vectors required for extraction, the selected features, the list of extractors with their parameters, and the Dask options.

to_json(*, path_or_buffer=None, **kwargs)[source]

Serialize the FeatureSpace to a JSON formatted string or file.

Parameters:
path_or_bufferstr, pathlib.Path, file-like object or None, optional

The file path or buffer to write the JSON data to. If None, the JSON data is returned as a string. Defaults to None.

**kwargs

Additional parameters to pass to io.store_json.

Returns:
str

The JSON formatted string if path_or_buffer is None.

See also

to_dict, to_yaml
to_yaml(*, path_or_buffer=None, **kwargs)[source]

Serialize the FeatureSpace to a YAML formatted string or file.

Parameters:
path_or_bufferstr, pathlib.Path, file-like object or None, optional

The file path or buffer to write the YAML data to. If None, the JSON data is returned as a string. Defaults to None.

**kwargs

Additional parameters to pass to io.store_json.

Returns:
str

The YAML formatted string if path_or_buffer is None.

feets.features module

Manage and manipulate feature extraction results.

class feets.features.Features(features, extractors)[source]

Bases: Sequence

Class to manage and manipulate feature extraction results.

The Features class encapsulates the results of feature extraction performed on multiple light curves. It provides an interface to access the extracted features either by feature name or by light curve index.

Parameters:
featuresarray_like of dict

The results of the feature extraction for each of the light curves.

extractorsarray_like of Extractor

The extractor instances used to compute the features.

Attributes:
featuresnp.ndarray

The extracted features by light curve.

extractorsnp.ndarray

The extractor instances used to compute the features.

feature_namesfrozenset

frozenset: The names of the extracted features.

lengthint

int: The number of light curves.

Examples

>>> from feets import FeatureSpace
>>> fs = FeatureSpace(only=["Std", "Mean"])
>>> results = fs.extract_many(
...     {"magnitude": [1, 1.5, 2]},
...     {"magnitude": [1, 2, 3]}
... )
>>> results
<Features feature_names={'Std', 'Mean'}, length=2>

Accessing results by feature name:

>>> results.Mean
array([1.5, 2. ])
>>> results.Std
array([0.5, 1. ])

Accessing results by light curve index:

>>> results[0]
{'Std': np.float64(0.5), 'Mean': np.float64(1.5)}
>>> results[1]
{'Std': np.float64(1.0), 'Mean': np.float64(2.0)}
as_frame(**kwargs)[source]

Convert the extraction results into a pandas.DataFrame.

This method transforms the extracted features into a pandas.DataFrame, where each row corresponds to a light curve and each column represents a feature.

The conversion process can be parallelized to improve performance on large datasets.

Parameters:
**kwargs

Keyword arguments passed to the joblib.Parallel constructor, used when parallel processing the pandas.DataFrame conversion.

Returns:
pandas.DataFrame

A pandas.DataFrame representation of the extracted features. Each row corresponds to a light curve and each column represents a feature.

Examples

>>> from feets import FeatureSpace
>>> fs = FeatureSpace(only=["Std", "Mean"])
>>> results = fs.extract_many(
...     {"magnitude": [1, 1.5, 2]},
...     {"magnitude": [1, 2, 3]}
... )
>>> results.as_frame()
Features     Std  Mean
Light Curve
0            0.5   1.5
1            1.0   2.0
property feature_names

frozenset: The names of the extracted features.

property length

int: The number of light curves.

feets.io module

Serialize and deserialize feets.FeatureSpace objects.

class feets.io.CustomJSONEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]

Bases: JSONEncoder

Custom JSON <https://json.org> encoder for feets.FeatureSpace objects.

This class extends the json.JSONEncoder to add support for the following objects and types:

Python

JSON

tuple, set, frozenset, np.ndarray

array

datetime

string

np.integer, np.floating, np.complexfloating

number

np.true

true

np.false

false

Attributes:
CONVERTERSdict

A dictionary mapping data types to their corresponding converter functions.

See also

json.JSONEncoder

Extensible JSON https://json.org encoder for Python data structures.

CONVERTERS = ((<class 'tuple'>, <class 'list'>), (<class 'set'>, <class 'list'>), (<class 'frozenset'>, <class 'list'>), (<class 'datetime.datetime'>, <method 'isoformat' of 'datetime.datetime' objects>), (<class 'numpy.integer'>, <class 'int'>), (<class 'numpy.floating'>, <class 'float'>), (<class 'numpy.complexfloating'>, <class 'complex'>), (<class 'numpy.bool'>, <class 'bool'>), (<class 'numpy.ndarray'>, <method 'tolist' of 'numpy.ndarray' objects>))
default(obj)[source]

Serialize an object to a JSON-serializable format.

This method overrides the default method of the json.JSONEncoder class to provide custom serialization for the data structures defined in the CONVERTERS attribute, or calls the base implementation for any other object.

Returns:
object

The JSON-serializable representation of the object.

Raises:
TypeError

If the object does not match any of the types in CONVERTERS.

feets.io.none_open_or_buffer(path_or_buffer, mode)[source]

Context manager to handle file paths or buffers as file-like objects.

This function provides a unified way to handle file paths, buffers, or in-memory buffers, and yields a file-like object for reading or writing.

Parameters:
path_or_bufferstr, pathlib.Path, file-like object or None
  • If str or pathlib.Path, the file at this given path is opened

    with the specified mode.

  • If a file-like object, it is yielded directly.

  • If None, an io.StringIO in-memory buffer is created and yielded.

modestr

The mode in which to open the file (e.g., ‘r’, ‘w’). This is ignored if path_or_buffer is not a path.

Yields:
file-like object

An open, ready-to-use file-like object.

feets.io.read_json(path_or_buffer)[source]

Deserialize a JSON formatted string or file to feets.FeatureSpace.

Parameters:
path_or_bufferstr, pathlib.Path or file-like object

The file path, buffer, or stream to read the JSON data from.

Returns:
feets.FeatureSpace

A feets.FeatureSpace object containing the deserialized data.

See also

feets.FeatureSpace

Class to select and extract features from a time series.

store_json
feets.io.read_yaml(path_or_buffer)[source]

Deserialize a YAML formatted string or file to feets.FeatureSpace.

Parameters:
path_or_bufferstr, pathlib.Path or file-like object

The file path, buffer, or stream to read the YAML data from.

Returns:
feets.FeatureSpace

A feets.FeatureSpace object containing the deserialized data.

See also

feets.FeatureSpace

Class to select and extract features from a time series.

store_yaml
feets.io.store_json(fspace, path_or_buffer=None, **kwargs)[source]

Serialize a feets.FeatureSpace to a JSON formatted string or file.

Parameters:
fspacefeets.FeatureSpace

The feets.FeatureSpace object to serialize. This object must implement a to_dict method that returns a serializable representation.

path_or_bufferstr, pathlib.Path, file-like object or None, default=None

The file path, buffer, or stream to write the JSON data to. If None, the JSON data is returned as a string.

**kwargs

Additional keyword arguments passed to json.dump when serializing the feature space.

Returns:
str or None

If path_or_buffer is None, returns a JSON formatted string representing the feature space. Otherwise, writes the JSON data to the specified file or buffer and returns None.

Raises:
TypeError

If the provided feature space contains non-serializable objects.

See also

feets.FeatureSpace

Class to select and extract features from a time series.

read_json
json.dump
feets.io.store_yaml(fspace, path_or_buffer=None, **kwargs)[source]

Serialize a feets.FeatureSpace to a YAML formatted string or file.

Parameters:
fspacefeets.FeatureSpace

The feets.FeatureSpace object to serialize. This object must implement a to_dict method that returns a serializable representation.

path_or_bufferstr, pathlib.Path, file-like object or None, default=None

The file path, buffer, or stream to write the YAML data to. If None, the YAML data is returned as a string.

**kwargs

Additional keyword arguments passed to yaml.safe_dump when serializing the feature space.

Returns:
str or None

If path_or_buffer is None, returns a YAML formatted string representing the feature space. Otherwise, writes the YAML data to the specified file or buffer and returns None.

Raises:
TypeError

If the provided feature space contains non-serializable objects.

See also

feets.FeatureSpace

Class to select and extract features from a time series.

read_yaml
yaml.safe_dump

feets.preprocess module

Functions for preprocessing light curve data vectors.

feets.preprocess.align(time, time2, magnitude, magnitude2, error=None, error2=None)[source]

Synchronizes two light curves in different bands.

Parameters:
timearray-like
time2array-like
magnitudearray-like
magnitude2array-like
errorarray-like, optional
error2array-like, optional
Returns:
aligned_timearray-like
aligned_magnitudearray-like
aligned_magnitude2array-like
aligned_errorarray-like
aligned_error2array-like
feets.preprocess.remove_noise(time, magnitude, error, error_limit=3, std_limit=5)[source]

Removes noise from the light curve data vectors.

Points within std_limit standard deviations from the mean and with errors greater than error_limit times the error mean are considered as noise and thus are eliminated.

Parameters:
timearray-like
magnitudearray-like
errorarray-like
error_limitfloat, default=3
std_limitfloat, default=5
Returns:
time_cleanarray-like
magnitude_cleanarray-like
error_cleanarray-like

feets.runner module

Run multiple feature extractors in parallel.

exception feets.runner.DataRequiredError[source]

Bases: ValueError

A required data vector is missing from the light curve.

feets.runner.run(*, extractors, selected_features, required_data, lcs, dask_options=None)[source]

Run instances of feature extractors on a collection of light curves.

Executes the specified extractor instances on each provided light curve, returning the extracted features for each. Feature extraction is performed in parallel using Dask, enabling efficient computation across multiple light curves. The order of execution respects dependencies between extractors; ensure that the extractors list is topologically sorted so that dependencies are satisfied.

Parameters:
extractorsarray_like of feets.extractors.Extractor

Feature extractor instances to apply. Must be sorted so that any extractor appears after those it depends on.

selected_featuresarray_like of str

Names of features to extract from each light curve.

required_dataarray_like of str

Names of required data fields that must be present in each light curve.

lcsarray_like of dict

Light curves to process, each represented as a dictionary of data vectors.

dask_optionsdict, optional

Options for the Dask scheduler. Defaults to {"scheduler": "processes"}.

Returns:
list of dict

List of dictionaries, one per input light curve, with the extracted feature values. Each dictionary contains the extracted features specified in selected_features. The order of the list matches the input lcs.

Raises:
DataRequiredError

If any of the required data vectors are missing from a light curve.

See also

feets.Extractor

Abstract base class for feature extractors.

feets.FeatureSpace

Class to select and extract features from a time series.

dask.compute

Notes

Feature extraction is parallelized using Dask. You can control parallelism and scheduler behavior via the dask_options parameter.

For more information on Dask, see: https://docs.dask.org/en/stable/

Examples

>>> from feets.extractors import Mean
>>>
>>> # Instantiate the feature extractor
>>> mean_extractor = Mean()
>>>
>>> # Light curves to process
>>> lcs = [{"magnitude": [1, 2, 3]}, {"magnitude": [4, 5, 6]}]
>>>
>>> # Run the feature extraction
>>> run(
...     extractors=[mean_extractor],
...     selected_features=["Mean"],
...     required_data=["magnitude"],
...     lcs=lcs
... )
[{'Mean': np.float64(2.0)}, {'Mean': np.float64(5.0)}]

Module contents

feets: feATURE eXTRACTOR FOR tIME sERIES.

In time-domain astronomy, data gathered from the telescopes is usually represented in the form of light-curves. These are time series that show the brightness variation of an object through a period of time (for a visual representation see video below). Based on the variability characteristics of the light-curves, celestial objects can be classified into different groups (quasars, long period variables, eclipsing binaries, etc.) and consequently be studied in depth independentely.

In order to characterize this variability, some of the existing methods use machine learning algorithms that build their decision on the light-curves features. Features, the topic of the following work, are numerical descriptors that aim to characterize and distinguish the different variability classes. They can go from basic statistical measures such as the mean or the standard deviation, to complex time-series characteristics such as the autocorrelation function.

In this package we present a library with a compilation of some of the existing light-curve features. The main goal is to create a collaborative and open tool where every user can characterize or analyze an astronomical photometric database while also contributing to the library by adding new features. However, it is important to highlight that this library is not restricted to the astronomical field and could also be applied to any kind of time series.

Our vision is to be capable of analyzing and comparing light-curves from all the available astronomical catalogs in a standard and universal way. This would facilitate and make more efficient tasks as modelling, classification, data cleaning, outlier detection and data analysis in general. Consequently, when studying light-curves, astronomers and data analysts would be on the same wavelength and would not have the necessity to find a way of comparing or matching different features. In order to achieve this goal, the library should be run in every existent survey (MACHO, EROS, OGLE, Catalina, Pan-STARRS, etc) and future surveys (LSST) and the results should be ideally shared in the same open way as this library.

feets.read_json(path_or_buffer)[source]

Deserialize a JSON formatted string or file to feets.FeatureSpace.

Parameters:
path_or_bufferstr, pathlib.Path or file-like object

The file path, buffer, or stream to read the JSON data from.

Returns:
feets.FeatureSpace

A feets.FeatureSpace object containing the deserialized data.

See also

feets.FeatureSpace

Class to select and extract features from a time series.

store_json
feets.read_yaml(path_or_buffer)[source]

Deserialize a YAML formatted string or file to feets.FeatureSpace.

Parameters:
path_or_bufferstr, pathlib.Path or file-like object

The file path, buffer, or stream to read the YAML data from.

Returns:
feets.FeatureSpace

A feets.FeatureSpace object containing the deserialized data.

See also

feets.FeatureSpace

Class to select and extract features from a time series.

store_yaml