Non-Textual: K Nearest Neighbors


What Is k-NN? In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. [Wikipedia]

Train Time: 0.29 seconds

Findings: The K Nearest Neighbors model had a pretty decent likelihood of recognizing fraudulent postings before the hyperparameter tuning, with a precision score of .80 and an overall accuracy of .96. After tuning, the precision score increased to .85 and the accuracy increased slightly to .97. It’s not the most effective model of the ones we tested but it certainly holds its own.

Even though the main objective of our project was to explore Natural Language Processing and see how effective these models were with regard to reading and interpreting text, we wanted to see how accurately the same models could predict fraudulent job postings based on some of the other available data in our dataset. The same features were used with each model, they included things such as whether or not a company logo was provided, whether or not a salary and/or benefits were provided, industry, required experience, required education, employment type, etc. It didn’t seem like there was a ton of data to rely on without the job description itself so we weren’t sure what to expect.

Model Comparison

Model Precision Fake
NLP k-NN 0.76
SVM 0.89
DL 0.92
NB 0.84
RF 0.58
Non-Textual k-NN 0.85
SVM 0.84
DL 0.86
RF 1.00