The mysterious bankruptcy of the Enron Corporation has led to the development of this project. This project is built to investigate this case on the huge data set of this fraud business, which took place in December 2001. The data set mainly comprises the millions of e-mails sent to and from the executives of the company during the year 2000-2002. The nature of emails was reported to be suspicious, and hence it was not possible for anyone to decide nature.
Machine Learning Kit will be shipped to you and you can learn and build using tutorials. You can start for free today!
1. Machine Learning (Career Building Course)
2. Fraud Detection using Machine Learning
3. Machine Learning using Python
4. Movie Recommendation using ML
5. Handwritten Digits Recognition using ML
To decide the nature based on the patterns of data led to the need for a machine learning project. The financial information contains a huge of numeric values, which again becomes a tiring job for anyone to classify. A machine learning application will classify the data itself and give the desired output.
Project Implementation
The first step is to explore the huge data which has around 21 variables and 146 observations. The Outlier investigation consists of checking the odd pattern of data like some of the employees were recorded to earn a huge amount of salary. Then we have to create for POI for received and sent emails. Then select the important feature required for observations, which are stock options, shared receipt, loan advance, long term incentive, salary, etc.
Skyfi Labs helps students learn practical skills by building real-world projects.
You can enrol with friends and receive kits at your doorstep
You can learn from experts, build working projects, showcase skills to the world and grab the best jobs.
Get started today!
The Algorithms which are found perfect for the study of data are Gaussian Naïve, Support vector machine and, Decision Tree Classifier. The most crucial part of machine learning is to tune and implement the algorithm. GridSearchCV tool is used to tune the algorithm, which is provided in Scikit learn. To extract most of the information from the data, a validation strategy is used, such as Nested Stratified Shuffle Cross-Validation.
This method will help us to extract the essential information from all that heap of data. Hyperparameter optimization is the process of optimizing the performance of machine learning using parameter tuning. The cross-validation method will help to cross-check the pattern of data and give the desired results. The tree classifier uses the cross-validation method, which is defined in the tester.py function.
Results and Conclusion
The application will hence be able to classify that huge data which almost 1.67 emails. The data will be processed through the algorithms and methods which will detect the real problem. It will show the odd data, which can be considered as fraud elements, and it can play an important role in the investigation of Enron.
Want to develop practical skills on Machine Learning? Checkout our latest projects and start learning for free
Join 250,000+ students from 36+ countries & develop practical skills by building projects
Get kits shipped in 24 hours. Build using online tutorials.
Stay up-to-date and build projects on latest technologies