Mahmoud Mayaleh

About me

I'm Mahmoud and I'm an AI engineer

AI Engineer with a Master’s in Artificial Intelligence, skilled in deep learning, NLP, and large-scale model training. Background in distributed AI systems and applied ML research.

linkedin View CV

My Journey

AI & 6G Network Researcher

Telecom SudParis (SAMOVAR) - Paris

Engineering distributed AI models for intelligent NF placement and migration in multi-domain 6G slicing.
Developing multi-agent AI systems for end-to-end slicing optimization under privacy constraints.
Implementing RL-based orchestration for dynamic VNF scaling and predictive migration.

Research Engineer

CNAM - Paris

Researched RDMA + LLM traffic to design scheduling strategies, model topologies, and compute overlaps using ns-3 & SimAI.
Simulated GPT workloads (up to 175B on 1024 GPUs) with SimAI to optimize NCCL/RDMA, and prototyped MSCCL collectives for scalability.

Master's in Artificial Intelligence for Connected Industries

CNAM, Paris, France

Back End Developer

Ostim Technical University (IT department) - Ankara, Türkiye

Built 5+ AI-enhanced Odoo ERP modules, improving workflows and efficiency by 20%.
Automated back end with Python and ML, cutting manual entry 40% and reducing errors.
Used AI-driven query optimization to speed up ERP load times by 25%.

Machine Learning Intern

ArkSigner - Ankara, Türkiye

Built a YOLOv5-based object detection pipeline, achieving 90% accuracy.
Deployed AI-driven optimizations that boosted system performance by 30%.
Validated on custom datasets for robust deployment.

Bachelor of Engineering: Computer Engineering

Ostim Technical University, Ankara, Türkiye

Skills

Core stack

Applied AI/ML engineer focused on building end-to-end systems—data pipelines, model training/serving, and MLOps. Strong in Python + PyTorch, with hands-on work in NLP and Computer Vision.

Python

PyTorch

TensorFlow / Keras

scikit-learn

NLP (Transformers)

Computer Vision (OpenCV)

Publications

Visualization from the sentiment augmentation study

Enhancing Sentiment Classification on Small Datasets through Data Augmentation and Transfer Learning

This work provides a unified benchmark of Easy Data Augmentation (EDA), back-translation, and contextual token substitution (NLPaug) on low-resource IMDb sentiment data, evaluated with Logistic Regression, Random Forest, and BERT under identical experimental conditions.

Abstract:

Small-scale sentiment classification suffers from data scarcity, which limits model generalization. This study systematically compares three text augmentation strategies under a controlled, reproducible framework.

EDA (Easy Data Augmentation) based on token-level synonym replacement, insertion, deletion, and swapping.
Back-translation using English↔French MarianMT models to create high-fidelity paraphrases.
Contextual token substitution (NLPaug-style) with pre-trained language models for semantics-preserving edits.

Methods: Experiments use a 5,000-sample IMDb subset with 100% augmentation, 10-fold cross-validation, and fixed seeds, comparing traditional classifiers (Logistic Regression, Random Forest) and a fine-tuned BERT base model on accuracy, F1, AUC, and effect sizes.

All augmentation strategies yield significant and statistically robust performance gains over non-augmented baselines.
Contextual augmentation delivers the most consistent improvements for BERT, reaching about 97% test accuracy on the augmented setting.
EDA and back-translation provide larger relative gains for traditional models, especially Random Forest, while exhibiting different diversity–cost trade-offs.

Published in: Discover Artificial Intelligence (Springer), 2026

Read Publication

LipLingo CNN lip-reading model schematic

LipLingo: CNN Model for Lip Reading

LipLingo is a deep learning lip-reading system built on LipNet with enhanced preprocessing and spatio-temporal modeling. It achieves 95% character-level and 87% sentence-level accuracy, showing strong potential for assistive tech and biometrics.

Abstract:

Lip reading plays a crucial role in applications such as speech recognition, assistive technologies for the hearing-impaired, and biometric authentication. However, performance often degrades when the speaker varies, the environment changes, or the speech sounds are visually similar. To overcome these challenges, we propose LipLingo:

Based on the LipNet architecture
Enhanced with standardized preprocessing for consistent mouth-region representation
Uses rotational validation for better generalization
Combines 3D convolutional layers (spatio-temporal features) and bidirectional recurrent layers (sequence modeling)
Optimized with Adam optimizer and CTC loss

Results:

95% character-level accuracy
87% sentence-level accuracy
Outperforms baseline LipNet

Published in: 2025

Read Publication

Projects

ATLAS: Adaptive Task-aware Federated Learning for LLMs

Diagram of federated learning with edge devices

A production-ready federated learning framework that enables efficient, multi-task fine-tuning of large language models across diverse edge devices.

View on GitHub

Highlights

Four-phase pipeline: task clustering, heterogeneous LoRA configuration, split federated learning, and Laplacian-based personalization.
Supports DistilBERT, Qwen2.5, GPT-2 with PyTorch + HuggingFace training.

Tech Stack

Python PyTorch HuggingFace PEFT/LoRA Federated Learning

Git Commit Multi-Agent

Diagram of multi-agent AI pipeline for git commits

AI-powered multi-agent system that generates high-quality conventional Git commit messages from staged diffs using local LLMs.

View on GitHub

Highlights

Three-agent pipeline (DiffAgent, SummaryAgent, CommitWriterAgent) with shared state for robust commit generation.
Runs entirely on-device via Ollama + OpenChat 7B, with zero API cost and full privacy.
Supports dry-run, verbose mode, and programmatic usage for CI or tooling integration.

Tech Stack

Python Ollama OpenChat 7B GitPython CLI Tools

Data Augmentation for Low-Resource Sentiment with BERT

Sentiment analysis graph comparing models and augmentation

Published study on data augmentation and transfer learning for sentiment classification in low-resource settings.

View on GitHub

Highlights

Unified benchmark of EDA, contextual augmentation, and back-translation on a 5k IMDb subset.
Evaluates Logistic Regression, Random Forest, and BERT with rigorous statistics (CIs, t-tests, effect sizes).
Reproducible pipeline with fixed seeds, exported metrics, and augmentation quality analysis via sentence embeddings.

Tech Stack

Python BERT Scikit-learn NLPaug Sentence-Transformers

LipLingo: CNN-Based Lip Reading

Frames of a mouth region used for lip reading

Computer vision model that performs lip reading from video frames for silent speech understanding.

View on GitHub

Highlights

CNN-based architecture tailored for visual-only speech recognition.
Preprocessing pipeline for mouth ROI extraction and frame normalization.
Positioned for accessibility and multimodal AI applications.

Tech Stack

Python CNN Computer Vision Deep Learning

Deep RL for Autonomous Robot Navigation

Simulated TurtleBot navigating around obstacles

Deep reinforcement learning for autonomous TurtleBot3 navigation using PPO in ROS 2 and Gazebo.

View on GitHub

Highlights

Trains deep reinforcement learning (PPO) agents to navigate complex environments.
Uses ROS 2 for realistic robot control and system orchestration.
Runs in Gazebo with configurable, obstacle-rich simulation worlds.

Tech Stack

Python ROS 2 Gazebo PPO Reinforcement Learning

5G Network Slicing Sandbox

5G network diagram with core, gNB, and slices

Production-style, containerized 5G standalone sandbox demonstrating network slicing with Open5GS (5GC), srsRAN gNB, GNU Radio, and virtual UEs — built for reproducible research and demos.

View on GitHub

Highlights

End-to-end 5G SA deployment: Open5GS core (NRF, AMF, SMF, UPF, etc.) + srsRAN gNB + virtual UEs.
Implements multiple network slices (e.g., eMBB, voice/video, low-latency) with QoS differentiation.
Containerized with Docker & docker-compose for reproducible experiments and demos.
GNU Radio + ZeroMQ integration for SDR-based radio processing and multiplexing.
Includes deployment scripts and CI-ready structure for research or prototype use.

Tech Stack

pen5GS srsRAN GNU Radio ZeroMQ Docker MongoDB

Contact me

Name

Mahmoud Mayaleh

Address

Paris, France

mahmoud@mayaleh.com

Message me