My research pursuits revolve around the fascinating intersection of multi-modal learning in scenarios with limited data and interpretable machine learning, with a particular focus on learning from natural geometries present in the problems. I am passionate about creating automated systems for applications in healthcare, robotics, and social good, with a strong emphasis on geometric (e.g. graph) representation learning, reinforcement learning, and machine perception.
I recently completed my studies in Robotics and Mechatronics Engineering at the University of Dhaka. I am currently working as a Research Assistant under the guidance of Dr. Sejuti Rahman. Recently, I joined ACI Limited as a Machine Learning Engineer where I work on several computer vision projects, namely PrescriptionOCR, Swapno POS/shoplifting theft detection, and BengaliOCR.
Outside of my research pursuits, I find great enjoyment in reading non-fiction books, particularly those related to moral philosophy and psychology. I am also an avid follower of soccer and chess, with Lionel Messi (G.O.A.T.!) and Magnus Carlsen being among my favorite players.
Masters in Robotics and Mechatronics Engineering (2022 - 2023)
University of Dhaka
Undergrad Student in Robotics and Mechatronics Engineering (2017 - 2021)
University of Dhaka
Zero-shot learning (ZSL) has emerged as a promising approach to categorizing unseen objects without labeled instances by leveraging external knowledge sources. However, in the domain of 3D object recognition, these sources often contain irrelevant information that hinders accurate object representation. Moreover, generalized zero-shot learning (GZSL) faces persistent challenges with a bias towards seen classes. To address these issues, we propose two main contributions: First, we introduce a novel multimodal embedding framework that enhances the quality of semantic information used in 3D object recognition. This framework aligns visual data from 2D images of 3D objects with graph-based semantic information from knowledge graphs and jointly learns embeddings from different modalities via an attention-based fusion module. These improved alignments and refined knowledge enable us to generate more accurate and discriminative attribute descriptions for each 3D object class. Second, to overcome the bias towards seen classes in GZSL, we propose a semi-supervised contrastive module combined with a generative adversarial network (GAN). This module integrates instance-level contrastive embedding and supervision, enhancing feature-level representations of unseen 3D object classes. Our approach improves generalization in GZSL scenarios, addressing the long-standing challenge of bias towards seen classes. We conduct extensive experiments on four state-of-the-art 3D object recognition datasets and evaluate performance in both inductive ZSL and GZSL tasks. The experimental results demonstrate the effectiveness of our method, achieving a 3% increase in the harmonic mean metric on the ModelNet40 dataset compared to the strongest baseline, 3DGenZ, and significant improvements over other state-of-the-art models.
During the COVID-19 pandemic, we realized the importance and necessity of automation in hospitals and healthcare facilities. Robotics has already established itself as anecessity in the medical field, ranging from automatic diagnostic systems to assisting nurses in health care facilities, thus reducing the tedious strain of physicians or nurses and increasing diagnostic accuracy. Our project utilizes the state-of-the-art advancements in vision-based action recognition, human robot interaction, artificial intelligence, and deep learning to build an autonomous, feature-rich hospital and clinic aid robot. The proposed Intelligent Hospital Assistance Robot (IHABOT) is equipped with a number of autonomous features. First of all, it can map its surroundings, determine the best route to take to its destination, and hence navigate by itself in a real-world environment. Second, it has a variety of sensors to collect physiological data from the patient, includ- ing temperature, systolic and diastolic blood pressure, oxygen saturation, and pulse rate. IHABOT uses artificial intelligence (AI) to automatically evaluate these physiological measures and detect patients who are deteriorating early, allowing for prompt treatment and a reduction in significant adverse events. Thirdly, in the absence of a doctor, this medical robot can keep track of and assess patients’ performance on exercises through- out post-stroke therapy. The robot delivers a performance score that aids in both patient self-evaluation of performance as well as medical professionals’ evaluation of patient development and prescription of required actions. Last but not the least, IHABOT diagnoses COVID-19 from radiography images and CT scans using a novel few-shot learning-based method. Often, the conventional diagnostic techniques with high accu- racy have the setback of being expensive and sophisticated, requiring skilled individuals for specimen collection and screening, resulting in lower outreach. Therefore, medical robots like IHABOT, which do not require direct human intervention, can be used in hospitals for automated diagnosis and to lessen the likelihood of infection spreading through reduced human-to-human contact.
In today’s digital world, automated sentiment analysis from online reviews can contribute to a wide variety of decision-making processes. One example is examining typical perceptions of a product based on customer feedbacks to have a better understanding of consumer expectations, which can help enhance everything from customer service to product offerings. Online review comments, on the other hand, frequently mix different languages, use non-native scripts and do not adhere to strict grammar norms. For a low-resource language like Bangla, the lack of annotated code-mixed data makes automated sentiment analysis more challenging. To address this, we collect online reviews of different products and construct an annotated Bangla-English code mix (BE-CM) dataset. On our sentiment corpus, we also compare several alternative models from the existing literature. We present a simple but effective data augmentation method that can be utilized with existing word embedding algorithms without the need for a parallel corpus to improve cross-lingual contextual understanding. Our experimental results suggest that training word embedding models (e.g., Word2vec, FastText) with our data augmentation strategy can help the model in capturing the cross-lingual relationship for code-mixed sentences, thereby improving the overall performance of existing classifiers in both supervised learning and zero-shot cross-lingual adaptability. With extensive experimentations, we found that XGBoost with Fasttext embedding trained on our proposed data augmentation method outperforms other alternative models in automated sentiment analysis on code-mixed Bangla-English dataset, with a weighted F1 score of 87%. This project is a collaboration with Centre for Advanced Research in Strategic Human Resource Management[CARSHRM], University of Dhaka.
Health professionals often prescribe patients to perform specific exercises for re- habilitation of several diseases (e.g., stroke, Parkinson, backpain). When patients perform those exercises in the absence of an expert (e.g., physicians/therapists), they cannot assess the correctness of the performance. Automatic assessment of physical rehabilitation exercises aims to assign a quality score given a RGBD or RGB video of the body movement as input. Recent deep learning approaches address this problem by extracting CNN features from co-ordinate grids of skele- ton data (body-joints) obtained from videos. However, they could not extract rich spatio-temporal features from variable-length inputs. To address this issue, we investigate Graph Convolutional Networks (GCNs) for this task. We adapt spatio-temporal GCN to predict continuous scores(assessment) instead of discrete class labels. Our model can process variable-length inputs so that users can per- form any number of repetitions of the prescribed exercise. Moreover, our novel design also provides self-attention of body-joints, indicating their role in predict- ing assessment scores. It guides the user to achieve a better score in future trials by matching the same attention weights of expert users. Our model successfully outperforms existing exercise assessment methods on KIMORE and UI-PRMD datasets.
This project focuses on the development and evaluation of an Artificial Intelligence (AI) agent for optimized stock trading. Utilizing the Deep Deterministic Policy Gradient (DDPG) algorithm and other relevant techniques, the agent's primary objective is to formulate an optimal policy that maximizes profits from its actions and corresponding positions in the stock market. To validate the agent's effectiveness and versatility, comprehensive testing has been conducted using datasets from two distinct stock markets: the S&P 500 and the Dhaka Stock Exchange (DSE). The results of this research endeavor promise to offer valuable insights into AI-driven stock trading strategies applicable across diverse financial markets. This project is a collaboration with Centre for Advanced Research in Strategic Human Resource Management[CARSHRM], University of Dhaka.