A simple pipeline that cosntantly makes requests in intervals to the GitHub API and performs operations on the streams of the GitHub Repository data to display useful insights on a dashboard
Developed a streaming pipeline that displays real time analytics of
repositories on GitHub on a dashboard
Used Docker Containders to deploy the spark cluster to perform
distributed processing of the API requests
Designed the Dashboard to disply the analytics results using Flask
The clusters use PySpark in order to perform the relevant operations
such as filtering, Map-Reduce etc.
Since the project was designed as a local project due to API request
limitations, Redis was used as a local Database to cache data
Full Stack E-Commerce Website
A simple web developement project to develop full-stack e-commerce site that consists of:
The Presentation Layer (frontend), The Business Layer (backend), The Data Layer (database)
Technology Stack for this project:
React
Springboot
PostgreSQL
Docker
Highlights:
designed and planned the designs for the project using figma
used springboot in order to design the backend of the system due to
the various modules that springboot comes with built-in, notably their
orm modules that make communicating with the database much more simpler
used react along with tailwind css in order to desgin the user interface for the web-site
vite was chosen as the bundler for the project due to it's ability to trim out any
unused modules and compress the whole project which results in an overall smaller
size footprint, improving responsiveness to user interactions
due to the reliability of relational databases and small scale of this project, postgresql
was chosen as the data persistence option.
the entire project was containerized used docker in order to make the whole application portable,
and flexible in case i wanted to switch cloud providers. the application was deployed to the cloud
on render.
in order to make the entire system portable, each component of the website was conatinerized in order to make migrations to other cloud providers more easier (if the need were to arise).
Association Rule Mining using Apriori Algorithm
A Python script that takes in a CSV file and find rules that have a support threshold higher than the value given by the user
A Python program that processes a collection of itemsets, which in
this case is the Walmart dataset to find interesting patterns in the
data
The program uses the Apriori algorithm to prune rules that don't meet
a given support threshold, a.k.a. percentage of a particular item with
respect to the dataset
Rules that are above the given support threshold are generated from
the frequent itemsets that remain after pruning
Some of the rules generated can be misleading at times. To prevent
this, a lift measure was used to ignore rules whose lift is <= 1.0
i.e. misleading
Building and Training a Decision Tree Model Classifier
A python program that uses various helper functions to build and train a Decision tree based on the dataset provided in order to predict the class of un-labelled data using the learnt decision tree.
In the python program, we can set the threshold for a node of the tree
to be a leaf node or an internal node
In order to decide what attribute to split a node on the Information
Gain criterion was used to rank attributes and pick the attribute with
the highest gain, i.e. gain of information when splitting on an
attribute
Perform node splitting starting from the parent node with all the
examples and gradually split based on Information Gain and build the
decision until a termination condition is satisfied, i.e. number of
examples in a node is less than or equal to the threshold
Split the major components of the algorithm into their own helper
functions to help make debugging easier and keeping code cleaner
Used the above components wrapped in helper functions to train the
final decision tree to perform classifcation on the testing data which
does not have the target attribute
Imbalanced Learning of a model using SciKit
A python script that measures the performance of various classification modelsagainst an imbalanced dataset, which is a dataset where the class distribution of the dataset is heavily skewed towards one class. Giving the learnt models a high accuracy numerically. Due to the fact that models assign almost all the new examples to the majority class.
Used SciKit to test a number of machine learning classifiers againsts
a number of datasets
Used Pandas Dataframes to make preprocessing the dataset easier and
clean the data to aid in learining from the training data
The performance of the models were recorded and compared to decide on
a model amongst the selected such as:
Decision Tree Classifier
K-Nearest Neighbor Classifier with K = 1 and 3
Gaussian Naive Bayes
Logistic Regression
Multi-Layer Perceptron (MLP) Neural Network
Random Forest Classifier
The final model was used to learn a classification model from an
Imbalanced dataset of credit card transactions and predict fraudulent
transactions. An Imbalanced dataset is a dataset where the number of
examples of one class is significantly higher than the other class
This allows us to examine the class that would usually always get
overlooked due to a skewed distribution, which more often than not is
the class that is actually of interest
Connect 4 Game against AI agent
A Python program that allows the user to play a game of connect-4 against different AI agents that utilise different algorithms for decision-making to make the next best move. Connect-4 is a solved game, so the player can win against the agent by playing the right moves.
Developed a Connect 4 game in Pyhton to implement an AI agent
The game incorporates various algorithms such as: MiniMax
(depth-limited), Alpha-Beta pruning, and Expectimax
Each algorithm improves upon the previous such as Alpha-Beta pruning
which helps us to search faster as the trees get deeper by effeciently
exploring the possiblities
Expectimax is a probabilistic algorithm that takes into account the
probability of the opponent making a move and the probability of the
agent winning, losing, or drawing to determine its choices
Dictionary Client
A Java project to learn about how dictionary servers communicate using the DICT protocol. TCP is used as the communication medium to send and receive requests.
Developed a client side dictionary application that makes requests and
parses them from various dictionary servers
Utilized Java Socket API to establish communication between the client
and the Dictionary servers
Referenced the Dictionary Server Protocol as specified on the RFC 2229
in order to make the appropriate requests to the available dictionary servers and display the results on
the GUI