Intro

Courseman is an automated course generation / recommendation platform, primarily aimed for budding developers, that takes in a user's learning interests and suggests a learning roadmap for them. The learning roadmap is ideally made to update itself based on the user's learning history, progress and engagement with recommended content.

Tech Stack

Tech Stack - Logos

Frontend: React.js, Redux, Bootstrap 5, HTML5, CSS3

Backend APIs: Express.js, FastAPI

Databases: MongoDB

Tools & Infra: MongoDB Atlas, Atlassian Bitbucket, Figma, Adobe XD

Languages: JavaScript, Python

Why this tech stack?

Frontend

Being a frontend application with quite a few reusable components and a modern UI, it only makes sense to make use React. Being a library, it allows flexibilities of using a go-as-you-like model for choosing different open-source toolkits to handle different parts of the frontend requirements.

Redux centralises the frontend data store, serving as a single source of truth for access throughout the component tree. Additionally, if used correctly, it can work as a light weight cache to reduce repeated network calls.

Backend

Python being multi-threaded handles the compute intensive workloads better compared to JavaScript. Thus, the scrapers, machine learning and NLP modules are exposed as Python FastAPI endpoints.

JavaScript on the other hand handles async I/O bound workloads better natively and hence the Express.js APIs handle most of the user facing APIs that require more database operations than computation.

Infrastructure

Bitbucket offers project-wise organisation of multiple repositories under the free tier, unlike GitHub. As this application is aimed to be consisting of multiple microservices and deployable units in the future, Bitbucket seems like a good alternative. Plus, what's the harm in trying something new!

Journey

This section documents the development journey of the application from start to present through different stages of engineering decisions.

v1.0.0 - Pilot

A user signs up on the platform and chooses topics of interest from a list of available topics, as a part of their initial profile setup. This data is used for generating the initial learning roadmap for the user.

After a successful sign up or sign in, the user will be taken to a dashboard that lists a learning roadmap divided into chapters and topics, with each topic containing the related top videos on Youtube.

Implementation

The implementation can be divided into 2 main engines:

  • Content Aggregation Engine
  • User-based Recommender Engine

Data Flow Engines

Content Aggregation Engine

The content aggregation engine comprises the processes running behind the scenes to scrape data from our primary content source, Youtube, and dump it on the raw data lake on MongoDB.

The backbone of the content aggregation engine is a novel Youtube scraper service built with Python and BeautifulSoup to scrape data off Youtube. The service functions through 2 main endpoints:

  • Search - that returns the video IDs of the top 10 relevant videos pertaining to a search term
  • Video Details - that returns the video details given a valid video ID.

To reduce scraping the same data repeatedly on each user request, the scraper service is invoked periodically by a cron job, against a novel database of topics and related keywords to keep a data dump of the Youtube data in our data lake in MongoDB Atlas.

Note: We were broke people who did not have enough funds to pay for the Youtube Data API and since this is a MVP we went the hackified way of reducing costs until we actually ship to production.

User-based Recommender Engine

The user-based recommender engine takes care of storing and processing user data to suggest relevant content from the content aggregation engine.