Yes, we can implement a simple linear regression algorithm using only Python, without relying on any external libraries like scikit-learn. The key components of the algorithm involve calculating the slope (coefficients) and intercept of the line that best fits the data.
Here's a pure Python implementation of linear regression using the method of least squares:
```python
# Step 1: Load the data (Boston Housing dataset)
# For this example, let's use a simplified version of the dataset with one feature for simplicity.
# In a real-world scenario, you would load the data from a file or another source.
X = [2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0] # Input feature (e.g., number of rooms)
y = [3.0, 4.0, 2.5, 5.0, 6.0, 8.0, 7.5] # Target variable (e.g., median house price)
# Step 2: Implement linear regression
def linear_regression(X, y):
n = len(X)
sum_x = sum(X)
sum_y = sum(y)
sum_xy = sum(x * y for x, y in zip(X, y))
sum_x_squared = sum(x ** 2 for x in X)
# Calculate the slope (coefficient) and intercept of the line
slope = (n * sum_xy - sum_x * sum_y) / (n * sum_x_squared - sum_x ** 2)
intercept = (sum_y - slope * sum_x) / n
return slope, intercept
# Step 3: Fit the model and get the coefficients
slope, intercept = linear_regression(X, y)
# Step 4: Make predictions on new data
def predict(X, slope, intercept):
return [slope * x + intercept for x in X]
# Step 5: Evaluate the model's performance
# For simplicity, let's calculate the mean squared error (MSE).
def mean_squared_error(y_true, y_pred):
n = len(y_true)
squared_errors = [(y_true[i] - y_pred[i]) ** 2 for i in range(n)]
return sum(squared_errors) / n
# Make predictions on the training data
y_pred_train = predict(X, slope, intercept)
# Calculate the mean squared error of the predictions
mse_train = mean_squared_error(y, y_pred_train)
print(f"Slope (Coefficient): {slope:.4f}")
print(f"Intercept: {intercept:.4f}")
print(f"Mean Squared Error: {mse_train:.4f}")
```
Note that this is a simplified example using a small dataset. In a real-world scenario, you would load a larger dataset and perform additional preprocessing steps to prepare the data for the linear regression model. Additionally, scikit-learn and other libraries offer more efficient and optimized implementations of linear regression, so using them is recommended for practical applications. However, this pure Python implementation illustrates the fundamental concepts behind linear regression.