This website is currently not up-to-date. I want to share my work with everyone to fullest extent but you know coporate business is not meant to be so. Please check out my Linkedin for the time being.
I am Kohki Mametani. I have a wide range of R&D experience in machine learning for audio, including Speech Synthesis, Music Information Retrieval, and audio retrieval system which I am currently working on for my Master's thesis. Aside from audio, I have 3-years of commercial experience in Natural Language Processing and OSS development (cross-platform desktop in Python, Android in Kotlin, and full-stack web development). I am leading an international team of a language-teaching service and succeeded in automating/outsourcing the production of educational videos. I share videos on YouTube for free and have grown our channel to 50k+ subscribers. Below is a showcase of my projects π
Studying similarity learning with triplet loss to produce expressive deep audio embeddings
Thesis: Diagnostic classifiers reveal context features hidden in End-to-End TTS
This work presents a novel analysis of hidden states of an End-to-End TTS system using eight criteria derived from the standard set of context features of parametric TTS. The paper was accepted to the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019.
- Implemented CNN model for tempo detection in Tensorflow which runs on DJ equipment
- Added object detection feature using YOLO9000 to a video search tool
- Built a browser automation tool with Selenium to collect audio data from the web and made a large training dataset
- Designed and built a browser-based image annotation tool using JavaScript and HTML5 which
was used by online annotators
- Developed a phoneme segmentation tool based on HTK which is used by other lab members and improved the productivity of manual segmentation by twice
- Worked on preparation for ICASSP 2019 and assisted undergraduate students for 2 months after graduation.
Joytan-REC aims to collect pronunciations from the crowd of language enthusiasts and make use of such voice recordings to develop free and fun language learning services.
Joytan Public is a place where we review user-generated voice recordings from Joytan-REC. In addition, the website provides a discussion forum and supplementary materials (online images, quizzes) for each of our videos.
Leading an international team of 50+ members to produce multilingual teaching video. I built NLP and TTS tools to produce language-teaching materials based on bilingual corpora. The video production is highly automated and outsourced.
Joytan (γΈγ§γ€ε) is a free, small cross-platform desktop application that facilitates the process of making audio/textbook and helps people create their own original educational materials.
This is a Django project deployed on heroku with Twitter's bootstrap as the front-end framework. While I designed a prototype with LaTeX and tikz, PDF generation is powered by ReportLab in production. It may take a few seconds to reach the website because the app sleeps after 30 min of inactivity.
Pycraft is a Python clone of Minecraft. The program implements many basic features of the original, running, jumping, flying and mining, yet the codebase is beautifully simplied thanks to Python. I contributed to the project by fixing several design flaws in the object-oriented program.
CGINC is a POV-ray
like raytracer written in C. The project was started off as a school project.
I personally implemented 3 features: specular light (Mirror effect),
model definition file (csgfile.txt
), and a pipeline for rendering with MS-Paint.