This section digs into what noise is, and what can cause it.
Definition
"Noise" refers to random fluctuations within data that cannot be attributed to any variables.
Below are two plots of Sine Waves. The first has "noise". This can be seen from the fluctuations. The second does not have any noise in it.
Different Causes of Noise
Noise can arise for various reasons. Some causes of noise are highlighted below:
- Measurement Errors: Measurement errors occur when there are inaccuracies or imprecisions in the process of collecting data. These errors can stem from instrument malfunction, human error, or limitations in measurement techniques.
- Sampling Variability: When data is collected from a sample rather than the entire population, sampling variability can lead to differences between the sample and the population. This can introduce noise into the statistics and estimates found in the sample.
- Random Noise: Random noise refers to unpredictable variations in the data that do not follow any pattern or trend. It can result from inherent randomness in the system being studied or from external factors that influence the data process.
What's the Problem with Noise?
Noise can bring many problems when looking at data. One problem is with "overfitting", which is where the model captures random fluctuations in the data rather than the true underlying relationships. Overfit models perform well on your data that has been used to fit the model or plot, but generalise poorly when new data is introduced. This can lead to inaccurate predictions.
Noise can also reduces the precision of the model or plot by introducing additional fluctuations into the data. This increased uncertainty can undermine the reliability of conclusions drawn from the analysis.