LinearRegressionWithSGD¶
- 
class 
pyspark.mllib.regression.LinearRegressionWithSGD[source]¶ Train a linear regression model with no regularization using Stochastic Gradient Descent.
New in version 0.9.0.
Deprecated since version 2.0.0: Use
pyspark.ml.regression.LinearRegression.Methods
train(data[, iterations, step, …])Train a linear regression model using Stochastic Gradient Descent (SGD).
Methods Documentation
- 
classmethod 
train(data: pyspark.rdd.RDD[pyspark.mllib.regression.LabeledPoint], iterations: int = 100, step: float = 1.0, miniBatchFraction: float = 1.0, initialWeights: Optional[VectorLike] = None, regParam: float = 0.0, regType: Optional[str] = None, intercept: bool = False, validateData: bool = True, convergenceTol: float = 0.001) → pyspark.mllib.regression.LinearRegressionModel[source]¶ Train a linear regression model using Stochastic Gradient Descent (SGD). This solves the least squares regression formulation
f(weights) = 1/(2n) ||A weights - y||^2
which is the mean squared error. Here the data matrix has n rows, and the input RDD holds the set of rows of A, each with its corresponding right hand side label y. See also the documentation for the precise formulation.
New in version 0.9.0.
- Parameters
 - data
pyspark.RDD The training data, an RDD of LabeledPoint.
- iterationsint, optional
 The number of iterations. (default: 100)
- stepfloat, optional
 The step parameter used in SGD. (default: 1.0)
- miniBatchFractionfloat, optional
 Fraction of data to be used for each SGD iteration. (default: 1.0)
- initialWeights
pyspark.mllib.linalg.Vectoror convertible, optional The initial weights. (default: None)
- regParamfloat, optional
 The regularizer parameter. (default: 0.0)
- regTypestr, optional
 The type of regularizer used for training our model. Supported values:
“l1” for using L1 regularization
“l2” for using L2 regularization
None for no regularization (default)
- interceptbool, optional
 Boolean parameter which indicates the use or not of the augmented representation for training data (i.e., whether bias features are activated or not). (default: False)
- validateDatabool, optional
 Boolean parameter which indicates if the algorithm should validate data before training. (default: True)
- convergenceTolfloat, optional
 A condition which decides iteration termination. (default: 0.001)
- data
 
- 
classmethod