Question: Can you help me create a dataset. They represent identity documents. If the identity number has less than 8 figures, we will replicate the first
Can you help me create a dataset. They represent identity documents. If the identity number has less than 8 figures, we will replicate the first ones until we obtain exactly 8. We must avoid the digits 0 and 1, if any of the figures is less than 2, we will replace it with 2.
Example 1: 12345679 -> 22345679
Example 2: 804156 -> 804145630 -> 82425632
using sklearn.datasets.make_regression from the scikit-learn library
Using the following arguments:
- n_sample = 200 + 10 first digit ID
- n_predictors = 10 + second digit ID + third digit ID
- n_informative = 10 + second digit ID
- bias = 2
- noise = 10 * fourth digit ID
- seed = ID_number
- shuffle = False
It will have at least 220 observations and 14 predictor variables, of which at least 2 will not be related to the response variable.
I found this example
# import library
from sklearn.datasets import make_regression
# create features and targets
features, target = make_regression(n_samples=100,
n_features=10,
n_informative=5,
n_targets=1,
random_state=42)
# print features and target
print("Features:")
print(features[:5])
print("Target:")
print(target[:5])
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
