{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true, "nbsphinx": "hidden" }, "outputs": [], "source": [ "from __future__ import print_function, unicode_literals\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "\n", "import seaborn as sns\n", "sns.set()\n", "\n", "plt.rcParams[\"figure.figsize\"] = 9, 4.51\n", "\n", "import expectexception\n", "\n", "from tutorial import *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial\n", "\n", "## Introduction\n", "\n", "A time series is a sequence of observations, or data points, that is arranged based on the times of their occurrence. The hourly measurement of wind speeds in meteorology, the minute by minute recording of electrical activity along the scalp in electroencephalography, and the weekly changes of stock prices in finances are just some examples of time series, among many others.\n", "Some of the following properties may be observed in time series data [[gutsequential](http://www.statistik-mathematik.uni-wuerzburg.de/fileadmin/10040800/user_upload/time_series/the_book/2011-March-01-times.pdf)]:\n", "\n", "- the data is not generated independently\n", "- their dispersion varies in time\n", "- they are often governed by a trend and/or have cyclic components\n", "\n", "The study and analysis of time series can have multiple ends: to gain a better understanding of the mechanism generating the data, to predict future outcomes and behaviors, to classify and characterize events, or more." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
Feature | \n", "Computed with | \n", "Dependencies | \n", "Input Data | \n", " \n", " \n", " \n", "
---|---|---|---|
Amplitude | \n", "\n", " | \n", " | magnitude | \n", "
AndersonDarling | \n", "\n", " | \n", " | magnitude | \n", "
Autocor_length | \n", "\n", " | \n", " | magnitude | \n", "
Beyond1Std | \n", "\n", " | \n", " | magnitude, error | \n", "
CAR_mean | \n", "CAR_sigma, CAR_tau | \n", "\n", " | magnitude, error, time | \n", "
CAR_sigma | \n", "CAR_mean, CAR_tau | \n", "\n", " | magnitude, error, time | \n", "
CAR_tau | \n", "CAR_mean, CAR_sigma | \n", "\n", " | magnitude, error, time | \n", "
Color | \n", "\n", " | \n", " | magnitude, magnitude2 | \n", "
Con | \n", "\n", " | \n", " | magnitude | \n", "
Eta_color | \n", "\n", " | \n", " | aligned_magnitude2, aligned_magnitude, aligned_time | \n", "
Eta_e | \n", "\n", " | \n", " | magnitude, time | \n", "
FluxPercentileRatioMid20 | \n", "\n", " | \n", " | magnitude | \n", "
FluxPercentileRatioMid35 | \n", "\n", " | \n", " | magnitude | \n", "
FluxPercentileRatioMid50 | \n", "\n", " | \n", " | magnitude | \n", "
FluxPercentileRatioMid65 | \n", "\n", " | \n", " | magnitude | \n", "
FluxPercentileRatioMid80 | \n", "\n", " | \n", " | magnitude | \n", "
Freq{i}_harmonics_amplitude_{j} | \n", "Freq{i}_harmonics_amplitude_{j} and Freq{i}_harmonics_rel_phase_{j} | \n", "\n", " | magnitude, time | \n", "
Gskew | \n", "\n", " | \n", " | magnitude | \n", "
LinearTrend | \n", "\n", " | \n", " | magnitude, time | \n", "
MaxSlope | \n", "\n", " | \n", " | magnitude, time | \n", "
Mean | \n", "\n", " | \n", " | magnitude | \n", "
Meanvariance | \n", "\n", " | \n", " | magnitude | \n", "
MedianAbsDev | \n", "\n", " | \n", " | magnitude | \n", "
MedianBRP | \n", "\n", " | \n", " | magnitude | \n", "
PairSlopeTrend | \n", "\n", " | \n", " | magnitude | \n", "
PercentAmplitude | \n", "\n", " | \n", " | magnitude | \n", "
PercentDifferenceFluxPercentile | \n", "\n", " | \n", " | magnitude | \n", "
PeriodLS | \n", "Period_fit, Psi_eta, Psi_CS | \n", "\n", " | magnitude, time | \n", "
Period_fit | \n", "Psi_eta, PeriodLS, Psi_CS | \n", "\n", " | magnitude, time | \n", "
Psi_CS | \n", "Period_fit, Psi_eta, PeriodLS | \n", "\n", " | magnitude, time | \n", "
Psi_eta | \n", "Period_fit, PeriodLS, Psi_CS | \n", "\n", " | magnitude, time | \n", "
Q31 | \n", "\n", " | \n", " | magnitude | \n", "
Q31_color | \n", "\n", " | \n", " | aligned_magnitude2, aligned_magnitude | \n", "
Rcs | \n", "\n", " | \n", " | magnitude | \n", "
Skew | \n", "\n", " | \n", " | magnitude | \n", "
SlottedA_length | \n", "\n", " | \n", " | magnitude, time | \n", "
SmallKurtosis | \n", "\n", " | \n", " | magnitude | \n", "
Std | \n", "\n", " | \n", " | magnitude | \n", "
StetsonJ | \n", "\n", " | \n", " | aligned_magnitude2, aligned_magnitude, aligned_error, aligned_error2 | \n", "
StetsonK | \n", "\n", " | \n", " | magnitude, error | \n", "
StetsonK_AC | \n", "\n", " | \n", " | magnitude, error, time | \n", "
StetsonL | \n", "\n", " | \n", " | aligned_magnitude2, aligned_magnitude, aligned_error, aligned_error2 | \n", "
StructureFunction_index_21 | \n", "StructureFunction_index_32, StructureFunction_index_31 | \n", "\n", " | magnitude, time | \n", "
StructureFunction_index_31 | \n", "StructureFunction_index_21, StructureFunction_index_32 | \n", "\n", " | magnitude, time | \n", "
StructureFunction_index_32 | \n", "StructureFunction_index_21, StructureFunction_index_31 | \n", "\n", " | magnitude, time | \n", "
Feature | \n", "Value | \n", " \n", " \n", " \n", "
---|---|
Std | \n", "0.141573174959 | \n", "
StetsonL | \n", "0.58237036372 | \n", "
Feature | \n", "Value | \n", " \n", " \n", " \n", "
---|---|
Amplitude | \n", "0.265 | \n", "
AndersonDarling | \n", "1.0 | \n", "
Autocor_length | \n", "1.0 | \n", "
Con | \n", "0.0 | \n", "
Eta_e | \n", "905.636200812 | \n", "
FluxPercentileRatioMid20 | \n", "0.0913140311804 | \n", "
FluxPercentileRatioMid35 | \n", "0.178173719376 | \n", "
FluxPercentileRatioMid50 | \n", "0.316258351893 | \n", "
FluxPercentileRatioMid65 | \n", "0.523385300668 | \n", "
FluxPercentileRatioMid80 | \n", "0.799554565702 | \n", "
Freq1_harmonics_amplitude_0 | \n", "0.132971918867 | \n", "
Freq1_harmonics_amplitude_1 | \n", "0.0770819007194 | \n", "
Freq1_harmonics_amplitude_2 | \n", "0.0497038938234 | \n", "
Freq1_harmonics_amplitude_3 | \n", "0.0253287258167 | \n", "
Freq1_harmonics_rel_phase_0 | \n", "0.0 | \n", "
Freq1_harmonics_rel_phase_1 | \n", "0.115067718485 | \n", "
Freq1_harmonics_rel_phase_2 | \n", "0.334299267194 | \n", "
Freq1_harmonics_rel_phase_3 | \n", "0.530855576474 | \n", "
Freq2_harmonics_amplitude_0 | \n", "0.0181114566827 | \n", "
Freq2_harmonics_amplitude_1 | \n", "0.0090622502799 | \n", "
Freq2_harmonics_amplitude_2 | \n", "0.00260631629629 | \n", "
Freq2_harmonics_amplitude_3 | \n", "0.00439984317964 | \n", "
Freq2_harmonics_rel_phase_0 | \n", "0.0 | \n", "
Freq2_harmonics_rel_phase_1 | \n", "0.641672610593 | \n", "
Freq2_harmonics_rel_phase_2 | \n", "1.68983485975 | \n", "
Freq2_harmonics_rel_phase_3 | \n", "-1.20002370194 | \n", "
Freq3_harmonics_amplitude_0 | \n", "0.0167797089051 | \n", "
Freq3_harmonics_amplitude_1 | \n", "0.00322942122796 | \n", "
Freq3_harmonics_amplitude_2 | \n", "0.00366435564108 | \n", "
Freq3_harmonics_amplitude_3 | \n", "0.00435784547109 | \n", "
Freq3_harmonics_rel_phase_0 | \n", "0.0 | \n", "
Freq3_harmonics_rel_phase_1 | \n", "0.441761624287 | \n", "
Freq3_harmonics_rel_phase_2 | \n", "-0.0264019762482 | \n", "
Freq3_harmonics_rel_phase_3 | \n", "-0.361561445702 | \n", "
Gskew | \n", "0.2455 | \n", "
LinearTrend | \n", "6.17365857681e-06 | \n", "
MaxSlope | \n", "54.7252583612 | \n", "
Mean | \n", "-5.91798911223 | \n", "
Meanvariance | \n", "-0.0239225135894 | \n", "
MedianAbsDev | \n", "0.0545 | \n", "
MedianBRP | \n", "0.745393634841 | \n", "
PairSlopeTrend | \n", "0.0333333333333 | \n", "
PercentAmplitude | \n", "-0.113085757398 | \n", "
PercentDifferenceFluxPercentile | \n", "-0.0752787325006 | \n", "
PeriodLS | \n", "0.936942217405 | \n", "
Period_fit | \n", "0.0 | \n", "
Psi_CS | \n", "0.188077038434 | \n", "
Psi_eta | \n", "0.707845086624 | \n", "
Q31 | \n", "0.141 | \n", "
Rcs | \n", "0.0391714507727 | \n", "
Skew | \n", "0.956469867559 | \n", "
SlottedA_length | \n", "1.0 | \n", "
SmallKurtosis | \n", "1.37947868013 | \n", "
Std | \n", "0.141573174959 | \n", "
StructureFunction_index_21 | \n", "2.04757219899 | \n", "
StructureFunction_index_31 | \n", "3.12766185693 | \n", "
StructureFunction_index_32 | \n", "1.69906462906 | \n", "
Feature | \n", "Value | \n", " \n", " \n", " \n", "
---|---|
Beyond1Std | \n", "0.222780569514 | \n", "
Mean | \n", "-5.91798911223 | \n", "
Feature | \n", "Value | \n", " \n", " \n", " \n", "
---|---|
Mean | \n", "-5.91798911223 | \n", "
Feature | \n", "Value | \n", " \n", " \n", " \n", "
---|---|
Amplitude | \n", "0.265 | \n", "
AndersonDarling | \n", "1.0 | \n", "
Autocor_length | \n", "1.0 | \n", "
Beyond1Std | \n", "0.222780569514 | \n", "
CAR_mean | \n", "-9.2306988739 | \n", "
CAR_sigma | \n", "-0.219280492988 | \n", "
CAR_tau | \n", "0.641120373773 | \n", "
Color | \n", "-0.333255024533 | \n", "
Con | \n", "0.0 | \n", "
Eta_color | \n", "12930.6852576 | \n", "
Eta_e | \n", "905.636200812 | \n", "
FluxPercentileRatioMid20 | \n", "0.0913140311804 | \n", "
FluxPercentileRatioMid35 | \n", "0.178173719376 | \n", "
FluxPercentileRatioMid50 | \n", "0.316258351893 | \n", "
FluxPercentileRatioMid65 | \n", "0.523385300668 | \n", "
FluxPercentileRatioMid80 | \n", "0.799554565702 | \n", "
Freq1_harmonics_amplitude_0 | \n", "0.132971918867 | \n", "
Freq1_harmonics_amplitude_1 | \n", "0.0770819007194 | \n", "
Freq1_harmonics_amplitude_2 | \n", "0.0497038938234 | \n", "
Freq1_harmonics_amplitude_3 | \n", "0.0253287258167 | \n", "
Freq1_harmonics_rel_phase_0 | \n", "0.0 | \n", "
Freq1_harmonics_rel_phase_1 | \n", "0.115067718485 | \n", "
Freq1_harmonics_rel_phase_2 | \n", "0.334299267194 | \n", "
Freq1_harmonics_rel_phase_3 | \n", "0.530855576474 | \n", "
Freq2_harmonics_amplitude_0 | \n", "0.0181114566827 | \n", "
Freq2_harmonics_amplitude_1 | \n", "0.0090622502799 | \n", "
Freq2_harmonics_amplitude_2 | \n", "0.00260631629629 | \n", "
Freq2_harmonics_amplitude_3 | \n", "0.00439984317964 | \n", "
Freq2_harmonics_rel_phase_0 | \n", "0.0 | \n", "
Freq2_harmonics_rel_phase_1 | \n", "0.641672610593 | \n", "
Freq2_harmonics_rel_phase_2 | \n", "1.68983485975 | \n", "
Freq2_harmonics_rel_phase_3 | \n", "-1.20002370194 | \n", "
Freq3_harmonics_amplitude_0 | \n", "0.0167797089051 | \n", "
Freq3_harmonics_amplitude_1 | \n", "0.00322942122796 | \n", "
Freq3_harmonics_amplitude_2 | \n", "0.00366435564108 | \n", "
Freq3_harmonics_amplitude_3 | \n", "0.00435784547109 | \n", "
Freq3_harmonics_rel_phase_0 | \n", "0.0 | \n", "
Freq3_harmonics_rel_phase_1 | \n", "0.441761624287 | \n", "
Freq3_harmonics_rel_phase_2 | \n", "-0.0264019762482 | \n", "
Freq3_harmonics_rel_phase_3 | \n", "-0.361561445702 | \n", "
Gskew | \n", "0.2455 | \n", "
LinearTrend | \n", "6.17365857681e-06 | \n", "
MaxSlope | \n", "54.7252583612 | \n", "
Mean | \n", "-5.91798911223 | \n", "
Meanvariance | \n", "-0.0239225135894 | \n", "
MedianAbsDev | \n", "0.0545 | \n", "
MedianBRP | \n", "0.745393634841 | \n", "
PairSlopeTrend | \n", "0.0333333333333 | \n", "
PercentAmplitude | \n", "-0.113085757398 | \n", "
PercentDifferenceFluxPercentile | \n", "-0.0752787325006 | \n", "
PeriodLS | \n", "0.936942217405 | \n", "
Period_fit | \n", "0.0 | \n", "
Psi_CS | \n", "0.188077038434 | \n", "
Psi_eta | \n", "0.707845086624 | \n", "
Q31 | \n", "0.141 | \n", "
Q31_color | \n", "0.106 | \n", "
Rcs | \n", "0.0391714507727 | \n", "
Skew | \n", "0.956469867559 | \n", "
SlottedA_length | \n", "1.0 | \n", "
SmallKurtosis | \n", "1.37947868013 | \n", "
Std | \n", "0.141573174959 | \n", "
StetsonJ | \n", "1.39841114014 | \n", "
StetsonK | \n", "0.690626626289 | \n", "
StetsonK_AC | \n", "0.812563161458 | \n", "
StetsonL | \n", "0.58237036372 | \n", "
StructureFunction_index_21 | \n", "2.04757219899 | \n", "
StructureFunction_index_31 | \n", "3.12766185693 | \n", "
StructureFunction_index_32 | \n", "1.69906462906 | \n", "
\n", "
\n", "\n", "Amplitude
\n", "The amplitude is defined as the half of the difference between the median of the maximum 5% and the median of the minimum 5% magnitudes. For a sequence of numbers from 0 to 1000 the amplitude should be equal to 475.5.
\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "richards2011machine \n", "Richards, J. W., Starr, D. L., Butler, N. R., Bloom, J. S., Brewer, J. M., Crellin-Quick, A., ... & Rischard, M. (2011). On machine-learned classification of variable stars with sparse and noisy time-series data. The Astrophysical Journal, 733(1), 10. Doi:10.1088/0004-637X/733/1/10. \n", "
\n", "
\n", "\n", "AndersonDarling
\n", "The Anderson-Darling test is a statistical test of whether a given sample of data is drawn from a given probability distribution. When applied to testing if a normal distribution adequately describes a set of data, it is one of the most powerful statistical tools for detecting most departures from normality.
\n", "For a normal distribution the Anderson-Darling statistic should take values close to 0.25.
\n", "
References
\n", "\n", "\n", " \n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "kim2009trending \n", "Kim, D. W., Protopapas, P., Alcock, C., Byun, Y. I., & Bianco, F. (2009). De-Trending Time Series for Astronomical Variability Surveys. Monthly Notices of the Royal Astronomical Society, 397(1), 558-568. Doi:10.1111/j.1365-2966.2009.14967.x. \n", "
\n", "
\n", "\n", "Autocor_length
\n", "The autocorrelation, also known as serial correlation, is the cross-correlation of a signal with itself. Informally, it is the similarity between observations as a function of the time lag between them. It is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies.
\n", "For an observed series\n", " \\(y_1, y_2,\\dots,y_T\\)\n", " with sample mean\n", " \\(\\bar{y}\\)\n", " , the sample lag\n", " \\(-h\\)\n", " autocorrelation is given by:
\n", " \\(\\rho_h = \\frac{\\sum_{t=h+1}^T (y_t - \\bar{y})(y_{t-h}-\\bar{y})}\n", " {\\sum_{t=1}^T (y_t - \\bar{y})^2}\\)\n", "Since the autocorrelation fuction of a light curve is given by a vector and we can only return one value as a feature, we define the length of the autocorrelation function where its value is smaller than\n", " \\(e^{-1}\\)\n", " .
\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "kim2011quasi \n", "Kim, D. W., Protopapas, P., Byun, Y. I., Alcock, C., Khardon, R., & Trichas, M. (2011). Quasi-stellar object selection algorithm using time variability and machine learning: Selection of 1620 quasi-stellar object candidates from MACHO Large Magellanic Cloud database. The Astrophysical Journal, 735(2), 68. Doi:10.1088/0004-637X/735/2/68. \n", "
100
\n",
" \n", "
\n", "\n", "Beyond1Std
\n", "Percentage of points beyond one standard deviation from the weighted mean. For a normal distribution, it should take a value close to 0.32:
\n", ">>> fs = feets.FeatureSpace(only=['Beyond1Std'])\n", ">>> features, values = fs.extract(**lc_normal)\n", ">>> dict(zip(features, values))\n", "{'Beyond1Std': 0.317}\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "richards2011machine \n", "Richards, J. W., Starr, D. L., Butler, N. R., Bloom, J. S., Brewer, J. M., Crellin-Quick, A., ... & Rischard, M. (2011). On machine-learned classification of variable stars with sparse and noisy time-series data. The Astrophysical Journal, 733(1), 10. Doi:10.1088/0004-637X/733/1/10. \n", "
\n", "
(Brockwell and Davis, 2002), a continious time auto regressive model.
\n", "CAR process has three parameters, it provides a natural and consistent way of estimating a characteristic time scale and variance of light-curves. CAR process is described by the following stochastic differential equation:
\n", "where the mean value of the lightcurve\n", " \\(X(t)\\)\n", " is\n", " \\(b\\tau\\)\n", " and the variance is\n", " \\(\\frac{\\tau\\sigma_C^2}{2}\\)\n", " .\n", " \\(\\tau\\)\n", " is the relaxation time of the process\n", " \\(X(t)\\)\n", " , it can be interpreted as describing the variability amplitude of the time series.\n", " \\(\\sigma_C\\)\n", " can be interpreted as describing the variability of the time series on time scales shorter than\n", " \\(\\tau\\)\n", " .\n", " \\(\\epsilon(t)\\)\n", " is a white noise process with zero mean and variance equal to one.
\n", "The likelihood function of a CAR model for a light-curve with observations\n", " \\(x - \\{x_1, \\dots, x_n\\}\\)\n", " observed at times\n", " \\(\\{t_1, \\dots, t_n\\}\\)\n", " with measurements error variances\n", " \\(\\{\\delta_1^2, \\dots, \\delta_n^2\\}\\)\n", " is:
\n", "To find the optimal parameters we maximize the likelihood with respect to\n", " \\(\\sigma_C\\)\n", " and\n", " \\(\\tau\\)\n", " and calculate\n", " \\(b\\)\n", " as the mean magnitude of the light-curve divided by\n", " \\(\\tau\\)\n", " .
\n", ">>> fs = feets.FeatureSpace(\n", "... only=['CAR_sigma', 'CAR_tau','CAR_mean'])\n", ">>> features, values = fs.extract(**lc_periodic)\n", ">>> dict(zip(features, values))\n", "{'CAR_mean': -9.230698873903961,\n", " 'CAR_sigma': -0.21928049298842511,\n", " 'CAR_tau': 0.64112037377348619}\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "pichara2012improved \n", "Pichara, K., Protopapas, P., Kim, D. W., Marquette, J. B., & Tisserand, P. (2012). An improved quasar detection method in EROS-2 and MACHO LMC data sets. Monthly Notices of the Royal Astronomical Society, 427(2), 1284-1297. Doi:10.1111/j.1365-2966.2012.22061.x. \n", "
nelder-mead
\n",
" \n", "
\n", "\n", "Color
\n", "The color is defined as the difference between the average magnitude of two different bands observations.
\n", ">>> fs = feets.FeatureSpace(only=['Color'])\n", ">>> features, values = fs.extract(**lc)\n", ">>> dict(zip(features, values))\n", "{'Color': -0.33325502453332145}\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "kim2011quasi \n", "Kim, D. W., Protopapas, P., Byun, Y. I., Alcock, C., Khardon, R., & Trichas, M. (2011). Quasi-stellar object selection algorithm using time variability and machine learning: Selection of 1620 quasi-stellar object candidates from MACHO Large Magellanic Cloud database. The Astrophysical Journal, 735(2), 68. Doi:10.1088/0004-637X/735/2/68. \n", "
\n", "
\n", "\n", "Con
\n", "Index introduced for the selection of variable stars from the OGLE database (Wozniak 2000). To calculate Con, we count the number of three consecutive data points that are brighter or fainter than\n", " \\(2\\sigma\\)\n", " and normalize the number by\n", " \\(N−2\\)\n", " .
\n", "For a normal distribution and by considering just one star, Con should take values close to 0.045:
\n", ">>> fs = feets.FeatureSpace(only=['Con'])\n", ">>> features, values = fs.extract(**lc_normal)\n", ">>> dict(zip(features, values))\n", "{'Con': 0.0476}\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "kim2011quasi \n", "Kim, D. W., Protopapas, P., Byun, Y. I., Alcock, C., Khardon, R., & Trichas, M. (2011). Quasi-stellar object selection algorithm using time variability and machine learning: Selection of 1620 quasi-stellar object candidates from MACHO Large Magellanic Cloud database. The Astrophysical Journal, 735(2), 68. Doi:10.1088/0004-637X/735/2/68. \n", "
3
\n",
" \n", "
\n", "\n", "Eta_color (\n", " \\(\\eta_{color}\\)\n", " )
\n", "Variability index Eta_e (\n", " \\(\\eta^e\\)\n", " ) calculated from the color light-curve.
\n", ">>> fs = feets.FeatureSpace(only=['Eta_color'])\n", ">>> features, values = fs.extract(**lc_normal)\n", ">>> dict(zip(features, values))\n", "{'Eta_color': 1.991749074648397}\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "kim2014epoch \n", "Kim, D. W., Protopapas, P., Bailer-Jones, C. A., Byun, Y. I., Chang, S. W., Marquette, J. B., & Shin, M. S. (2014). The EPOCH Project: I. Periodic Variable Stars in the EROS-2 LMC Database. arXiv preprint Doi:10.1051/0004-6361/201323252. \n", "
\n", "
\n", "\n", "Eta_e (\n", " \\(\\eta^e\\)\n", " )
\n", "Variability index\n", " \\(\\eta\\)\n", " is the ratio of the mean of the square of successive differences to the variance of data points. The index was originally proposed to check whether the successive data points are independent or not. In other words, the index was developed to check if any trends exist in the data (von Neumann 1941). It is defined as:
\n", " \\(\\eta = \\frac{1}{(N-1)\\sigma^2}\n", " \\sum_{i=1}^{N-1} (m_{i+1}-m_i)^2\\)\n", "The variability index should take a value close to 2 for a normal distribution.
\n", "Although\n", " \\(\\eta\\)\n", " is a powerful index for quantifying variability characteristics of a time series, it does not take into account unequal sampling. Thus\n", " \\(\\eta^r\\)\n", " is defined as:
\n", " \\(\\eta^e = \\bar{w} \\, (t_{N-1} - t_1)^2\n", " \\frac{\\sum_{i=1}^{N-1} w_i (m_{i+1} - m_i)^2}\n", " {\\sigma^2 \\sum_{i=1}^{N-1} w_i}\\)\n", "Where:
\n", " \\(w_i = \\frac{1}{(t_{i+1} - t_i)^2}\\)\n", "Example:
\n", ">>> fs = feets.FeatureSpace(only=['Eta_e'])\n", ">>> features, values = fs.extract(**lc_normal)\n", ">>> dict(zip(features, values))\n", "{'Eta_e': 2.0028592616231866}\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "kim2014epoch \n", "Kim, D. W., Protopapas, P., Bailer-Jones, C. A., Byun, Y. I., Chang, S. W., Marquette, J. B., & Shin, M. S. (2014). The EPOCH Project: I. Periodic Variable Stars in the EROS-2 LMC Database. arXiv preprint Doi:10.1051/0004-6361/201323252. \n", "
\n", "
In order to caracterize the sorted magnitudes distribution we use percentiles. If\n", " \\(F_{5, 95}\\)\n", " is the difference between 95% and 5% magnitude values, we calculate the following:
\n", "For the first feature for example, in the case of a normal distribution, this is equivalente to calculate:
\n", " \\(\\frac{erf^{-1}(2 \\cdot 0.6-1)-erf^{-1}(2 \\cdot 0.4-1)}\n", " {erf^{-1}(2 \\cdot 0.95-1)-erf^{-1}(2 \\cdot 0.05-1)}\\)\n", "So, the expected values for each of the flux percentile features are:
\n", "References
\n", "richards2011machine | \n", "Richards, J. W., Starr, D. L., Butler, N. R., Bloom, J. S., Brewer, J. M., Crellin-Quick, A., ... & Rischard, M. (2011). On machine-learned classification of variable stars with sparse and noisy time-series data. The Astrophysical Journal, 733(1), 10. Doi:10.1088/0004-637X/733/1/10. | \n", "
---|
\n", "
In order to caracterize the sorted magnitudes distribution we use percentiles. If\n", " \\(F_{5, 95}\\)\n", " is the difference between 95% and 5% magnitude values, we calculate the following:
\n", "For the first feature for example, in the case of a normal distribution, this is equivalente to calculate:
\n", " \\(\\frac{erf^{-1}(2 \\cdot 0.6-1)-erf^{-1}(2 \\cdot 0.4-1)}\n", " {erf^{-1}(2 \\cdot 0.95-1)-erf^{-1}(2 \\cdot 0.05-1)}\\)\n", "So, the expected values for each of the flux percentile features are:
\n", "References
\n", "richards2011machine | \n", "Richards, J. W., Starr, D. L., Butler, N. R., Bloom, J. S., Brewer, J. M., Crellin-Quick, A., ... & Rischard, M. (2011). On machine-learned classification of variable stars with sparse and noisy time-series data. The Astrophysical Journal, 733(1), 10. Doi:10.1088/0004-637X/733/1/10. | \n", "
---|
\n", "
In order to caracterize the sorted magnitudes distribution we use percentiles. If\n", " \\(F_{5, 95}\\)\n", " is the difference between 95% and 5% magnitude values, we calculate the following:
\n", "For the first feature for example, in the case of a normal distribution, this is equivalente to calculate:
\n", " \\(\\frac{erf^{-1}(2 \\cdot 0.6-1)-erf^{-1}(2 \\cdot 0.4-1)}\n", " {erf^{-1}(2 \\cdot 0.95-1)-erf^{-1}(2 \\cdot 0.05-1)}\\)\n", "So, the expected values for each of the flux percentile features are:
\n", "References
\n", "richards2011machine | \n", "Richards, J. W., Starr, D. L., Butler, N. R., Bloom, J. S., Brewer, J. M., Crellin-Quick, A., ... & Rischard, M. (2011). On machine-learned classification of variable stars with sparse and noisy time-series data. The Astrophysical Journal, 733(1), 10. Doi:10.1088/0004-637X/733/1/10. | \n", "
---|
\n", "
In order to caracterize the sorted magnitudes distribution we use percentiles. If\n", " \\(F_{5, 95}\\)\n", " is the difference between 95% and 5% magnitude values, we calculate the following:
\n", "For the first feature for example, in the case of a normal distribution, this is equivalente to calculate:
\n", " \\(\\frac{erf^{-1}(2 \\cdot 0.6-1)-erf^{-1}(2 \\cdot 0.4-1)}\n", " {erf^{-1}(2 \\cdot 0.95-1)-erf^{-1}(2 \\cdot 0.05-1)}\\)\n", "So, the expected values for each of the flux percentile features are:
\n", "References
\n", "richards2011machine | \n", "Richards, J. W., Starr, D. L., Butler, N. R., Bloom, J. S., Brewer, J. M., Crellin-Quick, A., ... & Rischard, M. (2011). On machine-learned classification of variable stars with sparse and noisy time-series data. The Astrophysical Journal, 733(1), 10. Doi:10.1088/0004-637X/733/1/10. | \n", "
---|
\n", "
In order to caracterize the sorted magnitudes distribution we use percentiles. If\n", " \\(F_{5, 95}\\)\n", " is the difference between 95% and 5% magnitude values, we calculate the following:
\n", "For the first feature for example, in the case of a normal distribution, this is equivalente to calculate:
\n", " \\(\\frac{erf^{-1}(2 \\cdot 0.6-1)-erf^{-1}(2 \\cdot 0.4-1)}\n", " {erf^{-1}(2 \\cdot 0.95-1)-erf^{-1}(2 \\cdot 0.05-1)}\\)\n", "So, the expected values for each of the flux percentile features are:
\n", "References
\n", "richards2011machine | \n", "Richards, J. W., Starr, D. L., Butler, N. R., Bloom, J. S., Brewer, J. M., Crellin-Quick, A., ... & Rischard, M. (2011). On machine-learned classification of variable stars with sparse and noisy time-series data. The Astrophysical Journal, 733(1), 10. Doi:10.1088/0004-637X/733/1/10. | \n", "
---|
\n", "
\n", "\n", "Periodic features extracted from light-curves using Lomb–Scargle (Richards et al., 2011)
\n", "Here, we adopt a model where the time series of the photometric magnitudes of variable stars is modeled as a superposition of sines and cosines:
\n", " \\(y_i(t|f_i) = a_i\\sin(2\\pi f_i t) + b_i\\cos(2\\pi f_i t) + b_{i,\\circ}\\)\n", "where\n", " \\(a\\)\n", " and\n", " \\(b\\)\n", " are normalization constants for the sinusoids of frequency\n", " \\(f_i\\)\n", " and\n", " \\(b_{i,\\circ}\\)\n", " is the magnitude offset.
\n", "To find periodic variations in the data, we fit the equation above by minimizing the sum of squares, which we denote\n", " \\(\\chi^2\\)\n", " :
\n", " \\(\\chi^2 = \\sum_k \\frac{(d_k - y_i(t_k))^2}{\\sigma_k^2}\\)\n", "where\n", " \\(\\sigma_k\\)\n", " is the measurement uncertainty in data point\n", " \\(d_k\\)\n", " . We allow the mean to float, leading to more robust period estimates in the case where the periodic phase is not uniformly sampled; in these cases, the model light curve has a non-zero mean. This can be important when searching for periods on the order of the data span\n", " \\(T_{tot}\\)\n", " . Now, define
\n", " \\(\\chi^2_{\\circ} = \\sum_k \\frac{(d_k - \\mu)^2}{\\sigma_k^2}\\)\n", "where\n", " \\(\\mu\\)\n", " is the weighted mean
\n", " \\(\\mu = \\frac{\\sum_k d_k / \\sigma_k^2}{\\sum_k 1/\\sigma_k^2}\\)\n", "Then, the generalized Lomb-Scargle periodogram is:
\n", " \\(P_f(f) = \\frac{(N-1)}{2} \\frac{\\chi_{\\circ}^2 - \\chi_m^2(f)}\n", " {\\chi_{\\circ}^2}\\)\n", "where\n", " \\(\\chi_m^2(f)\\)\n", " is\n", " \\(\\chi^2\\)\n", " minimized with respect to\n", " \\(a, b\\)\n", " and\n", " \\(b_{\\circ}\\)\n", " .
\n", "Following Debosscher et al. (2007), we fit each light curve with a linear term plus a harmonic sum of sinusoids:
\n", " \\(y(t) = ct + \\sum_{i=1}^{3}\\sum_{j=1}^{4} y_i(t|jf_i)\\)\n", "where each of the three test frequencies\n", " \\(f_i\\)\n", " is allowed to have four harmonics at frequencies\n", " \\(f_{i,j} = jf_i\\)\n", " . The three test frequencies\n", " \\(f_i\\)\n", " are found iteratively, by successfully finding and removing periodic signal producing a peak in\n", " \\(P_f(f)\\)\n", " , where\n", " \\(P_f(f)\\)\n", " is the Lomb-Scargle periodogram as defined above.
\n", "Given a peak in\n", " \\(P_f(f)\\)\n", " , we whiten the data with respect to that frequency by fitting away a model containing that frequency as well as components with frequencies at 2, 3, and 4 times that fundamental frequency (harmonics). Then, we subtract that model from the data, update\n", " \\(\\chi_{\\circ}^2\\)\n", " , and recalculate\n", " \\(P_f(f)\\)\n", " to find more periodic components.
\n", "Algorithm:
\n", "\n", "
\n", "- For\n", " \\(i = {1, 2, 3}\\)\n", "
\n", "- Calculate Lomb-Scargle periodogram\n", " \\(P_f(f)\\)\n", " for light curve.
\n", "- Find peak in\n", " \\(P_f(f)\\)\n", " , subtract that model from data.
\n", "- Update\n", " \\(\\chi_{\\circ}^2\\)\n", " , return to Step 1.
\n", "Then, the features extracted are given as an amplitude and a phase:
\n", "\\begin{align*}\n", "A_{i,j} = \\sqrt{a_{i,j}^2 + b_{i,j}^2}\\\\\n", "\\textrm{PH}_{i,j} = \\arctan(\\frac{b_{i,j}}{a_{i,j}})\n", "\\end{align*}\n", "where\n", " \\(A_{i,j}\\)\n", " is the amplitude of the\n", " \\(j-th\\)\n", " harmonic of the\n", " \\(i-th\\)\n", " frequency component and\n", " \\(\\textrm{PH}_{i,j}\\)\n", " is the phase component, which we then correct to a relative phase with respect to the phase of the first component:
\n", " \\(\\textrm{PH}'_{i,j} = \\textrm{PH}_{i,j} - \\textrm{PH}_{00}\\)\n", "and remapped to\n", " \\(|-\\pi, +\\pi|\\)\n", "
\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "richards2011machine \n", "Richards, J. W., Starr, D. L., Butler, N. R., Bloom, J. S., Brewer, J. M., Crellin-Quick, A., ... & Rischard, M. (2011). On machine-learned classification of variable stars with sparse and noisy time-series data. The Astrophysical Journal, 733(1), 10. Doi:10.1088/0004-637X/733/1/10. \n", "
{u'autopower_kwds': {u'nyquist_factor': 100, u'normalization': u'standard'}}
\n",
" \n", "
Median-of-magnitudes based measure of the skew.
\n", "\n", " \\(Gskew = m_{q3} + m_{q97} - 2m\\)\n", "\n", "\n", "Where:
\n", "\n", "
\n", "- \n", " \\(m_{q3}\\)\n", " is the median of magnitudes lesser or equal than the quantile 3.
\n", "- \n", " \\(m_{q97}\\)\n", " is the median of magnitudes greater or equal than the quantile 97.
\n", "- \n", " \\(m\\)\n", " is the median of magnitudes.
\n", "
\n", "
\n", "\n", "LinearTrend
\n", "Slope of a linear fit to the light-curve.
\n", ">>> fs = feets.FeatureSpace(only=['LinearTrend'])\n", ">>> features, values = fs.extract(**lc_normal)\n", ">>> dict(zip(features, values))\n", "{'LinearTrend': -3.2084065290292509e-06}\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "richards2011machine \n", "Richards, J. W., Starr, D. L., Butler, N. R., Bloom, J. S., Brewer, J. M., Crellin-Quick, A., ... & Rischard, M. (2011). On machine-learned classification of variable stars with sparse and noisy time-series data. The Astrophysical Journal, 733(1), 10. Doi:10.1088/0004-637X/733/1/10. \n", "
\n", "
\n", "\n", "PeriodLS
\n", "The Lomb-Scargle (L-S) algorithm (Scargle, 1982) is a variation of the Discrete Fourier Transform (DFT), in which a time series is decomposed into a linear combination of sinusoidal functions. The basis of sinusoidal functions transforms the data from the time domain to the frequency domain. DFT techniques often assume evenly spaced data points in the time series, but this is rarely the case with astrophysical time-series data. Scargle has derived a formula for transform coefficients that is similiar to the DFT in the limit of evenly spaced observations. In addition, an adjustment of the values used to calculate the transform coefficients makes the transform invariant to time shifts.
\n", "The Lomb-Scargle periodogram is optimized to identify sinusoidal-shaped periodic signals in time-series data. Particular applications include radial velocity data and searches for pulsating variable stars. L-S is not optimal for detecting signals from transiting exoplanets, where the shape of the periodic light-curve is not sinusoidal.
\n", "Next, we perform a test on the synthetic periodic light-curve we created (which period is 20) to confirm the accuracy of the period found by the L-S method
\n", "Period_fit
\n", "The false alarm probability of the largest periodogram value. Let's test it for a normal distributed data and for a periodic one.
\n", "Psi_CS (\n", " \\(\\Psi_{CS}\\)\n", " )
\n", "\n", " \\(R_{CS}\\)\n", " applied to the phase-folded light curve (generated using the period estimated from the Lomb-Scargle method).
\n", "Psi_eta (\n", " \\(\\Psi_{\\eta}\\)\n", " )
\n", "\n", " \\(\\eta^e\\)\n", " index calculated from the folded light curve.
\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "kim2011quasi \n", "Kim, D. W., Protopapas, P., Byun, Y. I., Alcock, C., Khardon, R., & Trichas, M. (2011). Quasi-stellar object selection algorithm using time variability and machine learning: Selection of 1620 quasi-stellar object candidates from MACHO Large Magellanic Cloud database. The Astrophysical Journal, 735(2), 68. Doi:10.1088/0004-637X/735/2/68. \n", "\n", " \n", "
\n", "\n", " \n", " \n", "kim2014epoch \n", "Kim, D. W., Protopapas, P., Bailer-Jones, C. A., Byun, Y. I., Chang, S. W., Marquette, J. B., & Shin, M. S. (2014). The EPOCH Project: I. Periodic Variable Stars in the EROS-2 LMC Database. arXiv preprint Doi:10.1051/0004-6361/201323252. \n", "
{u'autopower_kwds': {u'nyquist_factor': 100, u'normalization': u'standard'}}
\n",
" {u'method': u'simple', u'normalization': u'standard'}
\n",
" \n", "
\n", "\n", "MaxSlope
\n", "Maximum absolute magnitude slope between two consecutive observations.
\n", "Examining successive (time-sorted) magnitudes, the maximal first difference (value of delta magnitude over delta time)
\n", ">>> fs = feets.FeatureSpace(only=['MaxSlope'])\n", ">>> features, values = fs.extract(**lc_normal)\n", ">>> dict(zip(features, values))\n", "{'MaxSlope': 5.4943105823904741}\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "richards2011machine \n", "Richards, J. W., Starr, D. L., Butler, N. R., Bloom, J. S., Brewer, J. M., Crellin-Quick, A., ... & Rischard, M. (2011). On machine-learned classification of variable stars with sparse and noisy time-series data. The Astrophysical Journal, 733(1), 10. Doi:10.1088/0004-637X/733/1/10. \n", "
True
\n",
" \n", "
\n", "\n", "Mean
\n", "Mean magnitude. For a normal distribution it should take a value close to zero:
\n", ">>> fs = feets.FeatureSpace(only=['Mean'])\n", ">>> features, values = fs.extract(**lc_normal)\n", ">>> dict(zip(features, values))\n", "{'Mean': 0.0082611563419413246}\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "kim2014epoch \n", "Kim, D. W., Protopapas, P., Bailer-Jones, C. A., Byun, Y. I., Chang, S. W., Marquette, J. B., & Shin, M. S. (2014). The EPOCH Project: I. Periodic Variable Stars in the EROS-2 LMC Database. arXiv preprint Doi:10.1051/0004-6361/201323252. \n", "
\n", "
\n", "\n", "Meanvariance (\n", " \\(\\frac{\\sigma}{\\bar{m}}\\)\n", " )
\n", "This is a simple variability index and is defined as the ratio of the standard deviation\n", " \\(\\sigma\\)\n", " , to the mean magnitude,\n", " \\(\\bar{m}\\)\n", " . If a light curve has strong variability,\n", " \\(\\frac{\\sigma}{\\bar{m}}\\)\n", " of the light curve is generally large.
\n", "For a uniform distribution from 0 to 1, the mean is equal to 0.5 and the variance is equal to 1/12, thus the mean-variance should take a value close to 0.577:
\n", ">>> fs = feets.FeatureSpace(only=['Meanvariance'])\n", ">>> features, values = fs.extract(**lc_uniform)\n", ">>> dict(zip(features, values))\n", "{'Meanvariance': 0.5816791217381897}\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "kim2011quasi \n", "Kim, D. W., Protopapas, P., Byun, Y. I., Alcock, C., Khardon, R., & Trichas, M. (2011). Quasi-stellar object selection algorithm using time variability and machine learning: Selection of 1620 quasi-stellar object candidates from MACHO Large Magellanic Cloud database. The Astrophysical Journal, 735(2), 68. Doi:10.1088/0004-637X/735/2/68. \n", "
\n", "
\n", "\n", "MedianAbsDev
\n", "The median absolute deviation is defined as the median discrepancy of the data from the median data:
\n", " \\(Median Absolute Deviation = median(|mag - median(mag)|)\\)\n", "It should take a value close to 0.675 for a normal distribution:
\n", ">>> fs = feets.FeatureSpace(only=['MedianAbsDev'])\n", ">>> features, values = fs.extract(**lc_normal)\n", ">>> dict(zip(features, values))\n", "{'MedianAbsDev': 0.66332131466690614}\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "richards2011machine \n", "Richards, J. W., Starr, D. L., Butler, N. R., Bloom, J. S., Brewer, J. M., Crellin-Quick, A., ... & Rischard, M. (2011). On machine-learned classification of variable stars with sparse and noisy time-series data. The Astrophysical Journal, 733(1), 10. Doi:10.1088/0004-637X/733/1/10. \n", "
\n", "
\n", "\n", "MedianBRP (Median buffer range percentage)
\n", "Fraction (<= 1) of photometric points within amplitude/10 of the median magnitude
\n", ">>> fs = feets.FeatureSpace(only=['MedianBRP'])\n", ">>> features, values = fs.extract(**lc_normal)\n", ">>> dict(zip(features, values))\n", "{'MedianBRP': 0.559}\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "richards2011machine \n", "Richards, J. W., Starr, D. L., Butler, N. R., Bloom, J. S., Brewer, J. M., Crellin-Quick, A., ... & Rischard, M. (2011). On machine-learned classification of variable stars with sparse and noisy time-series data. The Astrophysical Journal, 733(1), 10. Doi:10.1088/0004-637X/733/1/10. \n", "
\n", "
\n", "\n", "PairSlopeTrend
\n", "Considering the last 30 (time-sorted) measurements of source magnitude, the fraction of increasing first differences minus the fraction of decreasing first differences.
\n", ">>> fs = feets.FeatureSpace(only=['PairSlopeTrend'])\n", ">>> features, values = fs.extract(**lc_normal)\n", ">>> dict(zip(features, values))\n", "{'PairSlopeTrend': -0.16666666666666666}\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "richards2011machine \n", "Richards, J. W., Starr, D. L., Butler, N. R., Bloom, J. S., Brewer, J. M., Crellin-Quick, A., ... & Rischard, M. (2011). On machine-learned classification of variable stars with sparse and noisy time-series data. The Astrophysical Journal, 733(1), 10. Doi:10.1088/0004-637X/733/1/10. \n", "
\n", "
\n", "\n", "PercentAmplitude
\n", "Largest percentage difference between either the max or min magnitude and the median.
\n", ">>> fs = feets.FeatureSpace(only=['PercentAmplitude'])\n", ">>> features, values = fs.extract(**lc_normal)\n", ">>> dict(zip(features, values))\n", "{'PercentAmplitude': -168.991253993057}\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "richards2011machine \n", "Richards, J. W., Starr, D. L., Butler, N. R., Bloom, J. S., Brewer, J. M., Crellin-Quick, A., ... & Rischard, M. (2011). On machine-learned classification of variable stars with sparse and noisy time-series data. The Astrophysical Journal, 733(1), 10. Doi:10.1088/0004-637X/733/1/10. \n", "
\n", "
\n", "\n", "PercentDifferenceFluxPercentile
\n", "Ratio of\n", " \\(F_{5, 95}\\)\n", " over the median magnitude.
\n", ">>> fs = feets.FeatureSpace(only=['PercentDifferenceFluxPercentile'])\n", ">>> features, values = fs.extract(**lc_normal)\n", ">>> dict(zip(features, values))\n", "{'PercentDifferenceFluxPercentile': -134.93590403825007}\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "richards2011machine \n", "Richards, J. W., Starr, D. L., Butler, N. R., Bloom, J. S., Brewer, J. M., Crellin-Quick, A., ... & Rischard, M. (2011). On machine-learned classification of variable stars with sparse and noisy time-series data. The Astrophysical Journal, 733(1), 10. Doi:10.1088/0004-637X/733/1/10. \n", "
\n", "
\n", "\n", "Q31 (\n", " \\(Q_{3-1}\\)\n", " )
\n", "\n", " \\(Q_{3-1}\\)\n", " is the difference between the third quartile,\n", " \\(Q_3\\)\n", " , and the first quartile,\n", " \\(Q_1\\)\n", " , of a raw light curve.\n", " \\(Q_1\\)\n", " is a split between the lowest 25% and the highest 75% of data.\n", " \\(Q_3\\)\n", " is a split between the lowest 75% and the highest 25% of data.
\n", ">>> fs = feets.FeatureSpace(only=['Q31'])\n", ">>> features, values = fs.extract(**lc_normal)\n", ">>> dict(zip(features, values))\n", "{'Q31': 1.3320376563134508}\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "kim2014epoch \n", "Kim, D. W., Protopapas, P., Bailer-Jones, C. A., Byun, Y. I., Chang, S. W., Marquette, J. B., & Shin, M. S. (2014). The EPOCH Project: I. Periodic Variable Stars in the EROS-2 LMC Database. arXiv preprint Doi:10.1051/0004-6361/201323252. \n", "
\n", "
\n", "\n", "Q31_color (\n", " \\(Q_{3-1|B-R}\\)\n", " )
\n", "\n", " \\(Q_{3-1}\\)\n", " applied to the difference between both bands of a light curve (B-R).
\n", ">>> fs = feets.FeatureSpace(only=['Q31_color'])\n", ">>> features, values = fs.extract(**lc_normal)\n", ">>> dict(zip(features, values))\n", "{'Q31_color': 1.8840489594535512}\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "kim2014epoch \n", "Kim, D. W., Protopapas, P., Bailer-Jones, C. A., Byun, Y. I., Chang, S. W., Marquette, J. B., & Shin, M. S. (2014). The EPOCH Project: I. Periodic Variable Stars in the EROS-2 LMC Database. arXiv preprint Doi:10.1051/0004-6361/201323252. \n", "
\n", "
\n", "\n", "Rcs - Range of cumulative sum (\n", " \\(R_{cs}\\)\n", " )
\n", "\n", " \\(R_{cs}\\)\n", " is the range of a cumulative sum (Ellaway 1978) of each light-curve and is defined as:
\n", "\\begin{align*}\n", "R_{cs} = max(S) - min(S) \\\\\n", "S = \\frac{1}{N \\sigma} \\sum_{i=1}^l (m_i - \\bar{m})\n", "\\end{align*}\n", "where max(min) is the maximum (minimum) value of S and\n", " \\(l=1,2, \\dots, N\\)\n", " .
\n", "\n", " \\(R_{cs}\\)\n", " should take a value close to zero for any symmetric distribution:
\n", ">>> fs = feets.FeatureSpace(only=['Rcs'])\n", ">>> features, values = fs.extract(**lc_normal)\n", ">>> dict(zip(features, values))\n", "{'Rcs': 0.0094459606901065168}\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "kim2011quasi \n", "Kim, D. W., Protopapas, P., Byun, Y. I., Alcock, C., Khardon, R., & Trichas, M. (2011). Quasi-stellar object selection algorithm using time variability and machine learning: Selection of 1620 quasi-stellar object candidates from MACHO Large Magellanic Cloud database. The Astrophysical Journal, 735(2), 68. Doi:10.1088/0004-637X/735/2/68. \n", "
\n", "
\n", "\n", "Skew
\n", "The skewness of a sample is defined as follow:
\n", " \\(Skewness = \\frac{N}{(N-1)(N-2)}\n", " \\sum_{i=1}^N (\\frac{m_i-\\hat{m}}{\\sigma})^3\\)\n", "Example:
\n", "For a normal distribution it should be equal to zero:
\n", ">>> fs = feets.FeatureSpace(only=['Skew'])\n", ">>> features, values = fs.extract(**lc_normal)\n", ">>> dict(zip(features, values))\n", "{'Skew': -0.00023325826785278685}\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "richards2011machine \n", "Richards, J. W., Starr, D. L., Butler, N. R., Bloom, J. S., Brewer, J. M., Crellin-Quick, A., ... & Rischard, M. (2011). On machine-learned classification of variable stars with sparse and noisy time-series data. The Astrophysical Journal, 733(1), 10. Doi:10.1088/0004-637X/733/1/10. \n", "
\n", "
\n", "\n", "SlottedA_length - Slotted Autocorrelation
\n", "In slotted autocorrelation, time lags are defined as intervals or slots instead of single values. The slotted autocorrelation function at a certain time lag slot is computed by averaging the cross product between samples whose time differences fall in the given slot.
\n", " \\(\\hat{\\rho}(\\tau=kh) = \\frac {1}{\\hat{\\rho}(0)\\,N_\\tau}\n", " \\sum_{t_i}\\sum_{t_j= t_i+(k-1/2)h }^{t_i+(k+1/2)h}\n", " \\bar{y}_i(t_i)\\,\\, \\bar{y}_j(t_j)\\)\n", "Where\n", " \\(h\\)\n", " is the slot size,\n", " \\(\\bar{y}\\)\n", " is the normalized magnitude,\n", " \\(\\hat{\\rho}(0)\\)\n", " is the slotted autocorrelation for the first lag, and\n", " \\(N_\\tau\\)\n", " is the number of pairs that fall in the given slot.
\n", ">>> fs = feets.FeatureSpace(\n", "... only=['SlottedA_length'], SlottedA_length={"t": 1})\n", ">>> features, values = fs.extract(**lc_normal)\n", ">>> dict(zip(features, values))\n", "{'SlottedA_length': 1.}\n", "Parameters
\n", "\n", "
\n", "- \n", "
T
: tau - slot size in days (default=1).
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "huijse2012information \n", "Huijse, P., Estevez, P. A., Protopapas, P., Zegers, P., & Principe, J. C. (2012). An information theoretic algorithm for finding periodicities in stellar light curves. IEEE Transactions on Signal Processing, 60(10), 5135-5145. \n", "
1
\n",
" \n", "
\n", "\n", "SmallKurtosis
\n", "Small sample kurtosis of the magnitudes.
\n", " \\(SmallKurtosis = \\frac{N (N+1)}{(N-1)(N-2)(N-3)}\n", " \\sum_{i=1}^N (\\frac{m_i-\\hat{m}}{\\sigma})^4 -\n", " \\frac{3( N-1 )^2}{(N-2) (N-3)}\\)\n", "For a normal distribution, the small kurtosis should be zero:
\n", ">>> fs = feets.FeatureSpace(only=['SmallKurtosis'])\n", ">>> features, values = fs.extract(**lc_normal)\n", ">>> dict(zip(features, values))\n", "{'SmallKurtosis': 0.044451779515607193}\n", "See http://www.xycoon.com/peakedness_small_sample_test_1.htm
\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "richards2011machine \n", "Richards, J. W., Starr, D. L., Butler, N. R., Bloom, J. S., Brewer, J. M., Crellin-Quick, A., ... & Rischard, M. (2011). On machine-learned classification of variable stars with sparse and noisy time-series data. The Astrophysical Journal, 733(1), 10. Doi:10.1088/0004-637X/733/1/10. \n", "
\n", "
\n", "\n", "Std - Standard deviation of the magnitudes
\n", "The standard deviation\n", " \\(\\sigma\\)\n", " of the sample is defined as:
\n", " \\(\\sigma=\\frac{1}{N-1}\\sum_{i} (y_{i}-\\hat{y})^2\\)\n", "For example, a white noise time serie should have\n", " \\(\\sigma=1\\)\n", "
\n", ">>> fs = feets.FeatureSpace(only=['Std'])\n", ">>> features, values = fs.extract(**lc_normal)\n", ">>> dict(zip(features, values))\n", "{'Std': 0.99320419310116881}\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "richards2011machine \n", "Richards, J. W., Starr, D. L., Butler, N. R., Bloom, J. S., Brewer, J. M., Crellin-Quick, A., ... & Rischard, M. (2011). On machine-learned classification of variable stars with sparse and noisy time-series data. The Astrophysical Journal, 733(1), 10. Doi:10.1088/0004-637X/733/1/10. \n", "
\n", "
These three features are based on the Welch/Stetson variability index\n", " \\(I\\)\n", " (Stetson, 1996) defined by the equation:
\n", " \\(I = \\sqrt{\\frac{1}{n(n-1)}} \\sum_{i=1}^n {\n", " (\\frac{b_i-\\hat{b}}{\\sigma_{b,i}})\n", " (\\frac{v_i - \\hat{v}}{\\sigma_{v,i}})}\\)\n", "where :math:b_i and\n", " \\(v_i\\)\n", " are the apparent magnitudes obtained for the candidate star in two observations closely spaced in time on some occasion\n", " \\(i\\)\n", " ,\n", " \\(\\sigma_{b, i}\\)\n", " and\n", " \\(\\sigma_{v, i}\\)\n", " are the standard errors of those magnitudes,\n", " \\(\\hat{b}\\)\n", " and hat{v} are the weighted mean magnitudes in the two filters, and\n", " \\(n\\)\n", " is the number of observation pairs.
\n", "Since a given frame pair may include data from two filters which did not have equal numbers of observations overall, the \"relative error\" is calculated as follows:
\n", " \\(\\delta = \\sqrt{\\frac{n}{n-1}} \\frac{v-\\hat{v}}{\\sigma_v}\\)\n", "allowing all residuals to be compared on an equal basis.
\n", "\n", "\n", "StetsonJ
\n", "Stetson J is a robust version of the variability index. It is calculated based on two simultaneous light curves of a same star and is defined as:
\n", " \\(J = \\sum_{k=1}^n sgn(P_k) \\sqrt{|P_k|}\\)\n", "with\n", " \\(P_k = \\delta_{i_k} \\delta_{j_k}\\)\n", "
\n", "For a Gaussian magnitude distribution, J should take a value close to zero:
\n", ">>> fs = feets.FeatureSpace(only=['StetsonJ'])\n", ">>> features, values = fs.extract(**lc_normal)\n", ">>> dict(zip(features, values))\n", "{'StetsonJ': 0.010765631555204736}\n", "
References
\n", "\n", "\n", " \n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "richards2011machine \n", "Richards, J. W., Starr, D. L., Butler, N. R., Bloom, J. S., Brewer, J. M., Crellin-Quick, A., ... & Rischard, M. (2011). On machine-learned classification of variable stars with sparse and noisy time-series data. The Astrophysical Journal, 733(1), 10. Doi:10.1088/0004-637X/733/1/10. \n", "
\n", "
These three features are based on the Welch/Stetson variability index\n", " \\(I\\)\n", " (Stetson, 1996) defined by the equation:
\n", " \\(I = \\sqrt{\\frac{1}{n(n-1)}} \\sum_{i=1}^n {\n", " (\\frac{b_i-\\hat{b}}{\\sigma_{b,i}})\n", " (\\frac{v_i - \\hat{v}}{\\sigma_{v,i}})}\\)\n", "where :math:b_i and\n", " \\(v_i\\)\n", " are the apparent magnitudes obtained for the candidate star in two observations closely spaced in time on some occasion\n", " \\(i\\)\n", " ,\n", " \\(\\sigma_{b, i}\\)\n", " and\n", " \\(\\sigma_{v, i}\\)\n", " are the standard errors of those magnitudes,\n", " \\(\\hat{b}\\)\n", " and hat{v} are the weighted mean magnitudes in the two filters, and\n", " \\(n\\)\n", " is the number of observation pairs.
\n", "Since a given frame pair may include data from two filters which did not have equal numbers of observations overall, the \"relative error\" is calculated as follows:
\n", " \\(\\delta = \\sqrt{\\frac{n}{n-1}} \\frac{v-\\hat{v}}{\\sigma_v}\\)\n", "allowing all residuals to be compared on an equal basis.
\n", "\n", "\n", "StetsonK
\n", "Stetson K is a robust kurtosis measure:
\n", " \\(\\frac{1/N \\sum_{i=1}^N |\\delta_i|}{\\sqrt{1/N \\sum_{i=1}^N \\delta_i^2}}\\)\n", "where the index\n", " \\(i\\)\n", " runs over all\n", " \\(N\\)\n", " observations available for the star without regard to pairing. For a Gaussian magnitude distribution K should take a value close to\n", " \\(\\sqrt{2/\\pi} = 0.798\\)\n", " :
\n", ">>> fs = feets.FeatureSpace(only=['StetsonK'])\n", ">>> features, values = fs.extract(**lc_normal)\n", ">>> dict(zip(features, values))\n", "{'StetsonK': 0.79914938521401002}\n", "
References
\n", "\n", "\n", " \n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "richards2011machine \n", "Richards, J. W., Starr, D. L., Butler, N. R., Bloom, J. S., Brewer, J. M., Crellin-Quick, A., ... & Rischard, M. (2011). On machine-learned classification of variable stars with sparse and noisy time-series data. The Astrophysical Journal, 733(1), 10. Doi:10.1088/0004-637X/733/1/10. \n", "
\n", "
These three features are based on the Welch/Stetson variability index\n", " \\(I\\)\n", " (Stetson, 1996) defined by the equation:
\n", " \\(I = \\sqrt{\\frac{1}{n(n-1)}} \\sum_{i=1}^n {\n", " (\\frac{b_i-\\hat{b}}{\\sigma_{b,i}})\n", " (\\frac{v_i - \\hat{v}}{\\sigma_{v,i}})}\\)\n", "where :math:b_i and\n", " \\(v_i\\)\n", " are the apparent magnitudes obtained for the candidate star in two observations closely spaced in time on some occasion\n", " \\(i\\)\n", " ,\n", " \\(\\sigma_{b, i}\\)\n", " and\n", " \\(\\sigma_{v, i}\\)\n", " are the standard errors of those magnitudes,\n", " \\(\\hat{b}\\)\n", " and hat{v} are the weighted mean magnitudes in the two filters, and\n", " \\(n\\)\n", " is the number of observation pairs.
\n", "Since a given frame pair may include data from two filters which did not have equal numbers of observations overall, the \"relative error\" is calculated as follows:
\n", " \\(\\delta = \\sqrt{\\frac{n}{n-1}} \\frac{v-\\hat{v}}{\\sigma_v}\\)\n", "allowing all residuals to be compared on an equal basis.
\n", "\n", "\n", "StetsonK_AC
\n", "Stetson K applied to the slotted autocorrelation function of the light-curve.
\n", ">>> fs = feets.FeatureSpace(only=['SlottedA_length','StetsonK_AC'])\n", ">>> features, values = fs.extract(**lc_normal)\n", ">>> dict(zip(features, values))\n", "{'SlottedA_length': 1.0, 'StetsonK_AC': 0.20917402545294403}\n", "Parameters
\n", "\n", "
\n", "- \n", "
T
: tau - slot size in days (default=1).
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "kim2011quasi \n", "Kim, D. W., Protopapas, P., Byun, Y. I., Alcock, C., Khardon, R., & Trichas, M. (2011). Quasi-stellar object selection algorithm using time variability and machine learning: Selection of 1620 quasi-stellar object candidates from MACHO Large Magellanic Cloud database. The Astrophysical Journal, 735(2), 68. Doi:10.1088/0004-637X/735/2/68. \n", "
1
\n",
" \n", "
These three features are based on the Welch/Stetson variability index\n", " \\(I\\)\n", " (Stetson, 1996) defined by the equation:
\n", " \\(I = \\sqrt{\\frac{1}{n(n-1)}} \\sum_{i=1}^n {\n", " (\\frac{b_i-\\hat{b}}{\\sigma_{b,i}})\n", " (\\frac{v_i - \\hat{v}}{\\sigma_{v,i}})}\\)\n", "where :math:b_i and\n", " \\(v_i\\)\n", " are the apparent magnitudes obtained for the candidate star in two observations closely spaced in time on some occasion\n", " \\(i\\)\n", " ,\n", " \\(\\sigma_{b, i}\\)\n", " and\n", " \\(\\sigma_{v, i}\\)\n", " are the standard errors of those magnitudes,\n", " \\(\\hat{b}\\)\n", " and hat{v} are the weighted mean magnitudes in the two filters, and\n", " \\(n\\)\n", " is the number of observation pairs.
\n", "Since a given frame pair may include data from two filters which did not have equal numbers of observations overall, the \"relative error\" is calculated as follows:
\n", " \\(\\delta = \\sqrt{\\frac{n}{n-1}} \\frac{v-\\hat{v}}{\\sigma_v}\\)\n", "allowing all residuals to be compared on an equal basis.
\n", "\n", "\n", "StetsonL
\n", "Stetson L variability index describes the synchronous variability of different bands and is defined as:
\n", " \\(L = \\frac{JK}{0.798}\\)\n", "Again, for a Gaussian magnitude distribution, L should take a value close to zero:
\n", ">>> fs = feets.FeatureSpace(only=['SlottedL'])\n", ">>> features, values = fs.extract(**lc_normal)\n", ">>> dict(zip(features, values))\n", "{'StetsonL': 0.0085957106316273714}\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "kim2011quasi \n", "Kim, D. W., Protopapas, P., Byun, Y. I., Alcock, C., Khardon, R., & Trichas, M. (2011). Quasi-stellar object selection algorithm using time variability and machine learning: Selection of 1620 quasi-stellar object candidates from MACHO Large Magellanic Cloud database. The Astrophysical Journal, 735(2), 68. Doi:10.1088/0004-637X/735/2/68. \n", "
\n", "
References
\n", "\n", "\n", "\n", "\n", " \n", "
\n", "\n", " \n", " \n", "simonetti1984small \n", "Simonetti, J. H., Cordes, J. M., & Spangler, S. R. (1984). Small-scale variations in the galactic magnetic field-The rotation measure structure function and birefringence in interstellar scintillations. The Astrophysical Journal, 284, 126-134. \n", "