Real Time Data Streaming Pipeline
A simple pipeline that cosntantly makes requests in intervals to the GitHub API and performs operations on the streams of the GitHub Repository data to display useful insights on a dashboard

nadim365/EECS4415-Big Data Systems: Project 3

Technology Stack for this project:
Python Redis docker Docker Apache Spark
Highlights:
  • Developed a streaming pipeline that displays real time analytics of repositories on GitHub on a dashboard
  • Used Docker Containders to deploy the spark cluster to perform distributed processing of the API requests
  • Designed the Dashboard to disply the analytics results using Flask
  • The clusters use PySpark in order to perform the relevant operations such as filtering, Map-Reduce etc.
  • Since the project was designed as a local project due to API request limitations, Redis was used as a local Database to cache data
Full Stack E-Commerce Website
A simple web developement project to develop full-stack e-commerce site that consists of: The Presentation Layer (frontend), The Business Layer (backend), The Data Layer (database)
Technology Stack for this project:
React Springboot PostgreSQL docker Docker
Highlights:
  • designed and planned the designs for the project using figma
  • used springboot in order to design the backend of the system due to the various modules that springboot comes with built-in, notably their orm modules that make communicating with the database much more simpler
  • used react along with tailwind css in order to desgin the user interface for the web-site
  • vite was chosen as the bundler for the project due to it's ability to trim out any unused modules and compress the whole project which results in an overall smaller size footprint, improving responsiveness to user interactions
  • due to the reliability of relational databases and small scale of this project, postgresql was chosen as the data persistence option.
  • the entire project was containerized used docker in order to make the whole application portable, and flexible in case i wanted to switch cloud providers. the application was deployed to the cloud on render.
  • in order to make the entire system portable, each component of the website was conatinerized in order to make migrations to other cloud providers more easier (if the need were to arise).
Association Rule Mining using Apriori Algorithm
A Python script that takes in a CSV file and find rules that have a support threshold higher than the value given by the user

nadim365/EECS4412_A1

Technology Stack for this project:
Python Scikit Learn Library
Highlights:
  • A Python program that processes a collection of itemsets, which in this case is the Walmart dataset to find interesting patterns in the data
  • The program uses the Apriori algorithm to prune rules that don't meet a given support threshold, a.k.a. percentage of a particular item with respect to the dataset
  • Rules that are above the given support threshold are generated from the frequent itemsets that remain after pruning
  • Some of the rules generated can be misleading at times. To prevent this, a lift measure was used to ignore rules whose lift is <= 1.0 i.e. misleading
Building and Training a Decision Tree Model Classifier
A python program that uses various helper functions to build and train a Decision tree based on the dataset provided in order to predict the class of un-labelled data using the learnt decision tree.

nadim365/EECS4412_A2

Technology Stack for this project:
Python Scikit Learn Library MatPlotLib Library
Highlights:
  • In the python program, we can set the threshold for a node of the tree to be a leaf node or an internal node
  • In order to decide what attribute to split a node on the Information Gain criterion was used to rank attributes and pick the attribute with the highest gain, i.e. gain of information when splitting on an attribute
  • Perform node splitting starting from the parent node with all the examples and gradually split based on Information Gain and build the decision until a termination condition is satisfied, i.e. number of examples in a node is less than or equal to the threshold
  • Split the major components of the algorithm into their own helper functions to help make debugging easier and keeping code cleaner
  • Used the above components wrapped in helper functions to train the final decision tree to perform classifcation on the testing data which does not have the target attribute
Imbalanced Learning of a model using SciKit
A python script that measures the performance of various classification modelsagainst an imbalanced dataset, which is a dataset where the class distribution of the dataset is heavily skewed towards one class. Giving the learnt models a high accuracy numerically. Due to the fact that models assign almost all the new examples to the majority class.

nadim365/EECS4412_A3

Technology Stack for this project:
Python pandas icon Pandas Library Scikit Learn Library
Highlights:
  • Used SciKit to test a number of machine learning classifiers againsts a number of datasets
  • Used Pandas Dataframes to make preprocessing the dataset easier and clean the data to aid in learining from the training data
  • The performance of the models were recorded and compared to decide on a model amongst the selected such as:
    • Decision Tree Classifier
    • K-Nearest Neighbor Classifier with K = 1 and 3
    • Gaussian Naive Bayes
    • Logistic Regression
    • Multi-Layer Perceptron (MLP) Neural Network
    • Random Forest Classifier
  • The final model was used to learn a classification model from an Imbalanced dataset of credit card transactions and predict fraudulent transactions. An Imbalanced dataset is a dataset where the number of examples of one class is significantly higher than the other class
  • This allows us to examine the class that would usually always get overlooked due to a skewed distribution, which more often than not is the class that is actually of interest
Connect 4 Game against AI agent
A Python program that allows the user to play a game of connect-4 against different AI agents that utilise different algorithms for decision-making to make the next best move. Connect-4 is a solved game, so the player can win against the agent by playing the right moves.

nadim365/AI_projects

Technology Stack for this project:
Python
Highlights:
  • Developed a Connect 4 game in Pyhton to implement an AI agent
  • The game incorporates various algorithms such as: MiniMax (depth-limited), Alpha-Beta pruning, and Expectimax
  • Each algorithm improves upon the previous such as Alpha-Beta pruning which helps us to search faster as the trees get deeper by effeciently exploring the possiblities
  • Expectimax is a probabilistic algorithm that takes into account the probability of the opponent making a move and the probability of the agent winning, losing, or drawing to determine its choices
Dictionary Client
A Java project to learn about how dictionary servers communicate using the DICT protocol. TCP is used as the communication medium to send and receive requests.

nadim365/EECS_3214_projects

Technology Stack for this project:
Java
Highlights:
  • Developed a client side dictionary application that makes requests and parses them from various dictionary servers
  • Utilized Java Socket API to establish communication between the client and the Dictionary servers
  • Referenced the Dictionary Server Protocol as specified on the RFC 2229 in order to make the appropriate requests to the available dictionary servers and display the results on the GUI