What Is SVM? In machine learning, support-vector machines (SVMs, also support-vector networks[1]) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier (although methods such as Platt scaling exist to use SVM in a probabilistic classification setting). [Wikipedia]
Train Time: 5.41 seconds
Findings: The support vector machine model was not great prior to hypertuning. Although it had an overall accuracy score of .95, it had a precision score of .67, which is by far the lowest score we’ve seen so far, even without parameter tuning. After calculating and applying the best parameters, however, the precision score increased to .84 and the overall accuracy increased to .97. The parameter tuning helped make this model comparable to the others in using the non-textual data to classify fake job postings, but it was still the least effective of the bunch.
Even though the main objective of our project was to explore Natural Language Processing and see how effective these models were with regard to reading and interpreting text, we wanted to see how accurately the same models could predict fraudulent job postings based on some of the other available data in our dataset. The same features were used with each model, they included things such as whether or not a company logo was provided, whether or not a salary and/or benefits were provided, industry, required experience, required education, employment type, etc. It didn’t seem like there was a ton of data to rely on without the job description itself so we weren’t sure what to expect.