Artificial Intelligence and Machine Learning: A collation of different Machine Learning Classifiers
Greetings my fellow readers, welcome to my first post on Medium.com . I, being an undergrad student of Computer Science and Engineering came across this inevitable hot topic called Machine Learning, that is expected to change the future we are being readied for. Having an AI course as our core subject was a boon to our skill as future runners of computing industry. The revolution in technological advancements have been running steadfast, pacing at lightspeeds. Lucky to be able to grasp this concept is welcomed and why wouldn’t it be?
Let’s delve into the topic now shall we?
Lock, Load and Read :
So we first need to load / read the data to begin the coding portion and we can do that through the use of the pandas library.
This allows to perform various data operations and now over our data we can create and use different models. Our data set appears like this by implementing it.
But we need it in a specific way before we feed data into our models so we need to process it accordingly.
Make the Data lit! : Data Pre-Processing.
This step is crucial and helps us encode and normalize our bizarre data set(s) in a way that helps us to get more accurate results. First, store our target feature, i.e. the Result of Treatment column, in a separate data frame as follows.
As our data set is not that bizarre and there are no non-numeric values. So, we don’t need to use any kind of encoding, such as label, one-hot, etc.
But we do need to normalize it, i.e. make every numerical value in a dataset in the range of 0 and 1 (both inclusive) so that we can normalize the dataset by using this code.
Now, our entire dataset is normalized as below.
But! But! We made a huge MISTAKE by normalizing our target feature, i.e. the Result of Treatment column.
But we stored the original in a separate variable, so, we just need to drop the normalized target feature column like.
Now, our data looks like that,
Splice and Dice: Splitting the Data.
Now, we need to split our data into testing and training. We’ll use sklearn’s train test split to do this.
Here, I took the test size to be 60 percent of the total data set. You can take whatever size you want and experiment with what gives you the best results for the models below.
A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements. — Wikipedia.org
Let’s write the code for this, and we’re going to use the Sklearn DecisionTreeClassifier as:
Now, let’s calculate the accuracy of this as:
Which gives the result of that:
Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees’ habit of overfitting to their training set. — Wikipedia.org
They’re basically a collection of a lot of decision trees.
Let’s write the code for this, and we’re going to use the RandomForestClassifier of sklearn as:
Let’s calculate the accuracy of this as:
Which gives the result of that:
Logistic regression is basically a supervised classification algorithm.The model builds a regression model to predict the probability that a given data entry belongs to the category numbered as “1”. Logistic regression becomes a classification technique only when a decision threshold is brought into the picture. The setting of the threshold value is a very important aspect of Logistic regression and is dependent on the classification problem itself. — Wikipedia.org
Let’s write the code for this, and we’re going to use sklearn’s LogisticRegression as:
Now, let’s calculate it’s accuracy as-
The perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector. — Wikipedia.org
There are 5 Perceptron components: input, bias, weights (parameters), activation function (most common is unit step function) and output.
Let’s write the code for this, and we’re going to use Sklearn’s Perceptron as
Now, let’s calculate it’s accuracy as
A feedforward neural network is an artificial neural network wherein connections between the nodes do not form a cycle. As such, it is different from recurrent neural networks.The feedforward neural network was the first and simplest type of artificial neural network devised. In this network, the information moves in only one direction, forward, from the input nodes, through the hidden nodes (if any) and to the output nodes. There are no cycles or loops in the network. — Wikipedia.org
We’re going to use Keras ‘ Sequential and Dense as
Sequential specifies to the Keras that we create the model sequentially, and the output of each layer that we add is the input to the next layer that we specify.
Model.add is used to add a layer to your neural network. We need to specify as an argument what kind of layer we want. The Dense layer is used to specify a fully connected layer. Dense arguments are output dimension which is 5000[which can be arbitrarily used].
After running all 5 classifiers, we found that the Random Forest Classifier did the best on this data-set, as it achieved the best accuracy among all the other classifiers having Logistic Regression on the second, Decision Tree (CART) on the third, Perceptron on the fourth and Neural Networks coming at the last.