Fraudulent Job Posts

NLP: Deep Learning with NLP

What Is Deep Learning with NLP? Deep learning (also known as deep structured learning or differential programming) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. [Wikipedia] NLP is a field in machine learning with the ability of a computer to understand, analyze, manipulate, and potentially generate human language. [Medium]

Classification Report: NLP Deep Learning

	precision	recall	f1-score	support
Real	0.94	0.99	0.96	1095
Fake	0.92	0.64	0.76	203
Micro Avg	0.94	0.94	0.94	1298
Macro Avg	0.93	0.82	0.86	1298
Weighted Avg	0.93	0.94	0.93	1298
Samples Avg	0.94	0.94	0.94	1298

Train Time: 42.8 seconds

Findings: Neural Networks (NN) are based on a linear stack of layers called Sequential. This model is very versatile with many parameters that can be tuned. In addition, a cool feature of the model is that we can examine the performance of the model while it is training using Callbacks. Callbacks are very useful functions that can be passed to the .fit(). To compare with other models, we used a subset of data with 1:5 ratio (fake:real posts). To run NN the data needs to be pre-processed: the data was first normalized using StandardScaler and one-hot encoded. For the model we use the most common optimization algorithm is Stochastic Gradient Descent (SGD). The graphs below showed the results of the parameter optimization: the learning rate, the momentum, the decay and the layers. The python algorithm is available on the Git Repository. After the parameter optimization the model took 42.8 s to train and had a precision of .92 and .94 for the fake and real posts respectively.

Deep Learning with NLP Parameters

Model Comparison

	Model	Precision Fake
NLP	k-NN	0.76
	SVM	0.89
	DL	0.92
	NB	0.84
	RF	0.58
Non-Textual	k-NN	0.85
	SVM	0.84
	DL	0.86
	RF	1.00