{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true, "nbsphinx": "hidden" }, "outputs": [], "source": [ "%matplotlib inline\n", "\n", "import matplotlib.pyplot as plt\n", "\n", "import seaborn as sns\n", "\n", "sns.set_theme()\n", "plt.rcParams[\"figure.figsize\"] = 9, 4.51" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Extractors Tutorial\n", "\n", "## 1. Introduction\n", "\n", "While `feets` comes with a wide array of pre-defined features, one of its core design principles is extensibility. You might have a novel feature in mind, need to implement an algorithm from a recent paper, or have a domain-specific metric that is not part of the standard library. `feets` allows you to seamlessly create and integrate your own feature extractors into its ecosystem.\n", "\n", "This tutorial will guide you through the process of building custom feature extractors. You will learn how to define an extractor, register it with the library, handle dependencies, accept parameters, and even generate features with dynamic names.\n", "\n", "\n", "## 2. Fundamentals\n", "\n", "\n", "In `feets`, a feature extractor is a Python class responsible for calculating one or more features from a light curve. To create a valid extractor, you must follow these rules:\n", "\n", "- The class must inherit from `feets.Extractor`.\n", "- The class must define a `features` attribute, a list of strings containing the names of the features it's able to compute.\n", "- The class must implement an `extract()` method. This method implements the feature extraction logic. It receives the light curve data (e.g., `magnitude`, `time`, `error`) as parameters and returns a dictionary where keys are the feature names (from the `features` list) and values are the calculated feature values.\n", "\n", "### Example 1: The `MaxMagMinTime` extractor\n", "\n", "As an example, let's create a simple extractor that finds the maximum magnitude and the minimum time value in a light curve:\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import feets\n", "\n", "import numpy as np\n", "\n", "\n", "# 1. Inherit from feets.Extractor\n", "class MaxMagMinTime(feets.Extractor):\n", " # 2. Define the names of the features to be extracted\n", " features = [\"magmax\", \"mintime\"]\n", "\n", " # 3. Implement the extraction logic\n", " # The parameters are the data vectors of the light curve.\n", " def extract(self, magnitude, time):\n", " # The return value must be a dictionary with keys matching\n", " # the `features` list.\n", " return {\"magmax\": np.max(magnitude), \"mintime\": np.min(time)}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Registering an extractor\n", "\n", "Once you have defined your extractor class, you need to add it to the underlying extractor registry in `feets` to make it available to use in any `FeatureSpace`. This is done using the `extractor_registry` utility:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "__main__.MaxMagMinTime" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from feets import extractor_registry\n", "\n", "# Register the class to make it available in the FeatureSpace\n", "extractor_registry.register_extractor(MaxMagMinTime)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, you can use the features `\"magmax\"` and `\"mintime\"` when defining a `FeatureSpace`, and `feets` will know how to compute them:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Featuresmintimemagmax
Light Curve
01.010.6
\n", "
" ], "text/plain": [ "Features mintime magmax\n", "Light Curve \n", "0 1.0 10.6" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a FeatureSpace using our new features\n", "fs = feets.FeatureSpace(only={\"magmax\", \"mintime\"})\n", "\n", "# Let's extract the features from some sample data\n", "time = [1, 2, 3, 4, 5]\n", "magnitude = [10.2, 10.5, 10.1, 10.6, 10.4]\n", "\n", "features = fs.extract(time=time, magnitude=magnitude)\n", "features.as_frame()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Extractors with dependencies\n", "\n", "With the `MaxMagMinTime` example, we've seen an extractor that computes features based on the provided light-curve data vectors. In addition to this data vectors, an extractor may also depend on features computed by other extractors.\n", "\n", "To define a feature dependency, simply add the required feature name as a parameter to your extractor's `extract()` method, along with the other data vectors. Make sure that the dependency is computed by some extractor present in the extractor registry." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example 2: The `TimeDuration` extractor\n", "\n", "Let's create an extractor that calculates the total duration of the light curve (`max_time - min_time`). We can reuse the `mintime` feature from our previous `MaxMagMinTime` extractor as a dependency for our `extract()` method:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "__main__.TimeDuration" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "class TimeDuration(feets.Extractor):\n", " features = [\"duration\"]\n", "\n", " # Add `mintime` as a parameter to define it as a dependency.\n", " def extract(self, time, mintime):\n", " return {\"duration\": np.max(time) - mintime}\n", "\n", "# Register the new extractor\n", "extractor_registry.register_extractor(TimeDuration)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When you use this new extractor, `feets` will automatically resolve its dependencies. It will inspect the signature of the `extract()` method and identifie which arguments correspond to feature names.\n", "\n", "In our `TimeDuration` example, `feets` recognizes that `mintime` is a required feature. It then searches the registry for an extractor that provides this feature (in this case, `MaxMagMinTime`) and ensures it is executed first. The resulting value is then passed as an argument to `TimeDuration.extract()`.\n", "\n", "This dependency resolution happens behind the scenes when a `FeatureSpace` object is initialized, creating an efficient execution plan for the selected features. You only need to request the final features of interest, and feets will automatically include and execute all the necessary intermediate steps." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Selected extractors in execution order: [MaxMagMinTime() TimeDuration()]\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Featuresduration
Light Curve
04
\n", "
" ], "text/plain": [ "Features duration\n", "Light Curve \n", "0 4" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# When we ask for \"duration\", feets knows it needs \"mintime\" first.\n", "# It will automatically add the MaxMagMinTime extractor to the plan.\n", "fs = feets.FeatureSpace(only={\"duration\"})\n", "print(f\"Selected extractors in execution order: {fs.extractors}\")\n", "\n", "# The result will include both the requested and dependency features.\n", "features = fs.extract(time=time, magnitude=magnitude)\n", "features.as_frame()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Configure extractors with parameters\n", "\n", "Some feature extraction algorithms require parameters to be configured, such as a specific quantile value or the number of bins for a histogram. `feets` supports this by allowing you to create configurable extractors.\n", "\n", "To add parameters to your extractor, define an `__init__()` method in the class. This method can accept and store any parameters you need. These stored parameters are then available within the `extract()` method, allowing you to customize the feature calculation logic.\n", "\n", "### Example 3: The `QuantileMagnitude` extractor\n", "\n", "Let's illustrate this with an extractor that calculates the magnitude at a given quantile. The quantile itself will be a parameter that we can configure when we use the extractor:\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "__main__.QuantileMagnitude" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "class QuantileMagnitude(feets.Extractor):\n", " features = [\"quantile_mag\"]\n", "\n", " def __init__(self, quantile=0.5):\n", " # Store the parameter\n", " self.quantile = quantile\n", "\n", " def extract(self, magnitude):\n", " # Use the parameter in the calculation\n", " q_mag = np.quantile(magnitude, self.quantile)\n", " return {\"quantile_mag\": q_mag}\n", "\n", "extractor_registry.register_extractor(QuantileMagnitude)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To use the new `QuantileMagnitude` extractor with a custom parameter, you need to specify the parameter's value when the `FeatureSpace` is initialized. This is achieved by passing a keyword argument to the `FeatureSpace` constructor where:\n", "\n", "- The keyword's name matches the extractor's class name (`QuantileMagnitude`).\n", "- The keyword's value is a dictionary containing the parameters to be passed to the extractor's `__init__()` method.\n", "\n", "For instance, to compute the 90th percentile of the magnitude, you would configure the `QuantileMagnitude` extractor by setting its `quantile` parameter to `0.9`:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Featuresquantile_mag
Light Curve
010.56
\n", "
" ], "text/plain": [ "Features quantile_mag\n", "Light Curve \n", "0 10.56" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fs_quantile = feets.FeatureSpace(\n", " only=[\"quantile_mag\"],\n", " QuantileMagnitude={\"quantile\": 0.9}\n", ")\n", "\n", "features = fs_quantile.extract(magnitude=magnitude)\n", "features.as_frame()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Handling multi-value features\n", "\n", "Some feature extraction tasks naturally produce multiple values from a single computation. A common example is fitting a model to the data, where the result is a set of coefficients. `feets` is designed to handle this scenario gracefully.\n", "\n", "The standard practice is to create an extractor that returns a single feature, but with its value being an array or a dictionary containing all the individual results. When you convert the extracted features to a `pandas.DataFrame` using the `.as_frame()` method of the resulting `Features` object, it will automatically \"flatten\" this non-scalar feature. It creates a separate column for each value in the array, this is tipically done appending a suffix to the original feature name.\n", "\n", "### Example 4: The `PolynomialFit` Extractor\n", "\n", "To demonstrate this, let's build an extractor that performs a polynomial fit on the light-curve data. This extractor will be configurable, allowing the user to specify the degree of the polynomial. The `extract()` method will compute the coefficients of the fit and return them as a single array-like feature." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "__main__.PolynomialFit" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "class PolynomialFit(feets.Extractor):\n", " # Define the single, multi-value feature name.\n", " features = [\"poly_coeffs\"]\n", "\n", " def __init__(self, degree=1):\n", " super().__init__()\n", " self.degree = degree\n", "\n", " def extract(self, time, magnitude):\n", " # Fit the polynomial\n", " coeffs = np.polyfit(time, magnitude, self.degree)\n", " # Return the coefficients as a single array\n", " return {\"poly_coeffs\": coeffs}\n", "\n", "extractor_registry.register_extractor(PolynomialFit)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Featurespoly_coeffs_0poly_coeffs_1poly_coeffs_2
Light Curve
0-0.0071430.09285710.16
\n", "
" ], "text/plain": [ "Features poly_coeffs_0 poly_coeffs_1 poly_coeffs_2\n", "Light Curve \n", "0 -0.007143 0.092857 10.16" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Instantiate the extractor with its parameters\n", "fs_poly = feets.FeatureSpace(\n", " only=[\"poly_coeffs\"],\n", " PolynomialFit={\"degree\": 2},\n", ")\n", "\n", "# The resulting feature set should have columns like `poly_coeffs_0`,\n", "# `poly_coeffs_1`, etc.\n", "features = fs_poly.extract(time=time, magnitude=magnitude)\n", "features.as_frame()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Customizing feature flattening\n", "\n", "While the automatic flattening of multi-value features is convenient, there are times when you might want more control over the resulting column names or the flattening logic itself. `feets` allows you to customize this behavior by implementing a special `flatten_feature()` method in your extractor.\n", "\n", "This method is called by `.as_frame()` when it encounters a non-scalar feature value from that extractor. It gives you the opportunity to define exactly how the feature should be represented in the final `pandas.DataFrame`.\n", "\n", "The `flatten_feature()` method receives the feature name and its computed value and must return a dictionary where keys are the desired column names and values are the corresponding scalar values. This provides fine-grained control over the output for array-like and dictionary-like features and can even be used to handle more complex, custom data structures.\n", "\n", "### Example 5: The `MagnitudeStats` extractor\n", "\n", "Let's create an extractor called `MagnitudeStats` that computes several descriptive statistics (mean, standard deviation, and median) for the magnitude. Instead of defining a separate feature for each statistic, we will return them all within a single dictionary.\n", "\n", "Then, we will implement the `flatten_feature()` method to transform this dictionary into a set of columns with custom names (e.g., `mag_mean`, `mag_std`). This approach keeps the feature extraction logic organized while providing full control over the final data representation:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "__main__.MagnitudeStats" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "class MagnitudeStats(feets.Extractor):\n", " # This extractor computes multiple stats and returns them as a dictionary.\n", " features = [\"mag_stats\"]\n", "\n", " def extract(self, magnitude):\n", " stats = {\n", " \"mean\": np.mean(magnitude),\n", " \"std\": np.std(magnitude),\n", " \"median\": np.median(magnitude),\n", " }\n", " return {\"mag_stats\": stats}\n", "\n", " # Implement the custom flattening logic.\n", " def flatten_feature(self, feature_name, feature_value):\n", " if feature_name != \"mag_stats\":\n", " # For features other than \"mag_stats\", use the default behavior.\n", " return super().flatten_feature(feature_name, feature_value)\n", "\n", " # For \"mag_stats\", we expect feature_value to be a dictionary.\n", " # We will prepend \"mag_\" to each stat name to create the column names.\n", " return {f\"mag_{k}\": v for k, v in feature_value.items()}\n", "\n", "\n", "# Register the new extractor\n", "extractor_registry.register_extractor(MagnitudeStats)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Featuresmag_meanmag_stdmag_median
Light Curve
010.360.18547210.4
\n", "
" ], "text/plain": [ "Features mag_mean mag_std mag_median\n", "Light Curve \n", "0 10.36 0.185472 10.4" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a FeatureSpace with our new feature\n", "fs_stats = feets.FeatureSpace(only=[\"mag_stats\"])\n", "\n", "# Extract features and display the flattened DataFrame\n", "# The columns will be \"mag_mean\", \"mag_std\", and \"mag_median\"\n", "# thanks to our custom flatten_feature method.\n", "features = fs_stats.extract(magnitude=magnitude)\n", "features.as_frame()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "extractor_registry.unregister_extractor(TimeDuration)\n", "extractor_registry.unregister_extractor(MaxMagMinTime)\n", "extractor_registry.unregister_extractor(QuantileMagnitude)\n", "extractor_registry.unregister_extractor(PolynomialFit)\n", "extractor_registry.unregister_extractor(MagnitudeStats)" ] } ], "metadata": { "celltoolbar": "Edit Metadata", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.7" } }, "nbformat": 4, "nbformat_minor": 2 }