## Introduction To Statistical Analysis In Electrical Engineering:

In the age of digitalization, billions of consumers are producing numerous amounts of data every day. Unorganized data, whether in the form of pictures, audio, words, or numbers, is of no use. Unless it is structured and organized in a proper form to extract meaningful data. Statistical Analysis can be defined similarly,

“It involves gathering, organizing, studying and extracting meaningful data, patterns, and trends”.

## Statistical Analysis In Electrical Engineering

Statistical analysis plays a leading role in improving an existing system. The following are some of the major areas where statistical analysis is currently being used.

## Statistics In Machine Learning

As the world is moving towards autonomous systems that are capable of predicting, analyzing, and making decisions, machine learning has become a prominent field of a progressive future.

Statistics are the basics of machine learning. It is needed to effectively predict and model application through deep learning. Some examples have been mentioned for using statistics in machine learning.

**Problem Framing:**This involves the framing of a business problem as a machine learning problem. It determines the elements to be predicted. For example, with the help of data analysis and data mining, the future sale of a product can be predicted by using the past and current sales. This can help in deciding the product to analyze for better performance of a manufacturer.**Data Understanding:**It uses summary statistics and data visualization to present the data that is easy to read and understand.**Data Cleaning:**In this, the gathered data is filtered and modified for exploring it further. It uses statistical techniques such as outlier detection, imputation, etc. It is very important to remove unnecessary data to save time. Clean data makes it easier to understand and model.**Data Selection:**Not all features are to be used in machine learning. Sometimes it is better to use the most important features for best results. It uses data sampling and feature selection methods for doing this. It reduces the training time, evaluation time, and the required computational power.**Data Preparation:**Some changes in the shape or structure of the data are needed for learning algorithms to work efficiently. For this, the following statistical methods are commonly used.**Scaling**: It uses techniques such as normalization and standardization. The input variables need to be scaled from 0 to 1 or with a standard deviation 1.**Transforms:**It uses power transforms that convert numerical input and output variables to have Gaussian probability distribution.**Encoding**: It uses techniques such as integer encoding and one hot encoding.**Model Evaluation:**This is a crucial step in the process. It uses experimental design and resampling methods, for training and evaluating data by making subsets. It improves the accuracy of the algorithm.**Model Configuration:**It uses statistical hypothesis tests and estimation statistics. Machine learning algorithms have hyperparameters that can be used to change the algorithm for specific problems. The accuracy of the resulting algorithms is checked through these statistical methods.**Model Selection:**There are many machine learning algorithms available to reach the desired result. They may have various pros and cons. The method of selecting the best-fit algorithm from many others is called model selection. It uses a statistical hypothesis test and estimation statistics in this process.**Model Presentation:**This stage comes once the training of the final model is completed and it is ready to be presented to the stakeholders. Estimated skills of the model are presented. It uses estimation statistics such as confidence intervals and tolerance intervals for this purpose.**Model Prediction:**Once the final model is complete and ready to be used for making predictions, it is necessary to quantify the uncertainty for a prediction. It uses estimation statistics such as confidence intervals and prediction intervals for this.

## Statistics In Industry

Statistical analysis is used to increase productivity in the industry. Let’s have a look at an example.

Consider a struggling electrical device manufacturer is struggling in making a profit. The business is going downhill and it seems like there is no other option but to shut down the production of certain products. By using statistics and probability, the data and records of the past and the present sale can be evaluated to predict the future sales of every product produced. The product with loss can be discontinued or reduced and the product. While the product is predicted to have an increase in sales and profit can be increased in production.

This will save the industry from making the worst possible decision.

## Statistics In Digital Signal Processing

Statistics and probability play a huge role in signal processing. Signal processing is extensively used in today’s world with our dependency on computers increasing. It is a basic part of all our modern technologies, as it is a technique of converting physical data into digital data and processing it for further use.

Probability and statistics are used in signal processing for example:

- It is used to determine an average signal to noise ratio (SNR) across a circuit board and how much does it vary
- It is used in determining the correlation between ambient temperature and efficiency.

## Techniques Used In Statistical Analysis

Some basic techniques commonly used are mentioned below.

### Stochastic Process

The word “Stochastic” means random. It deals with random variables that change their value in each time interval. Random variables are difficult to predict. It deals with states (random variable X) and state-space (a set of random variables). The stochastic process is widely used in signal processing and industries.

### Normal Distribution

Data can be distributed in many ways. A normal distribution has a bell curve and most of the data is close to the central value and the data is evenly distributed to the left and right side of the central data.

In normal distribution, Mean=Median=Mode. It has a symmetry of data with half of the values less than the mean and half of the values greater than the mean.

### Standard Deviation

Standard deviation is a measure of distance from the central value or means value. Or it tells how much are the numbers spread. A low standard deviation means the set of values are closer to the mean value whereas a high standard deviation means that the set of values are further away from the mean value.

It is equal to the square root of the variance and is represented by the Greek letter sigma (σ).

### Variance

It is defined as the average of the squared difference from the mean. It is the square of standard deviation

### Probability

It is the likeliness of a future event. The likelihood of an event that is assured to happen is 1. Similarly, the probability of an event that cannot occur is 0. For example, if a coin is tossed whose both sides have heads, then the probability of getting heads is 1 while the probability to get tails is 0. Probability is dependent on statistics and representation of data.

## Conclusion

Statistical analysis and probability play an important role in making the system work. All the future decisions in a project or industry are based on predictions and data visualization of the collected data. Various statistical methods are used and applied to make the best decisions. It is considered an important tool for quality control and quality assurance.

I am an electrical engineer from NED University, currently working in the oil and gas industry.

I have worked on projects related to power generation, distribution, C++, and python programming. I am an avid reader and love to read mystery, fiction and fantasy novels, always on the lookout for something new.