Last year, I wrote a blog post on the development and release of Type4Py. Type4Py is a machine learning model for code. In a nutshell, it predicts type annotations for Python source code files and enables developers to add types gradually to their codebases. At the time of the Type4Py release, its deployment was pretty simple. I didn’t use containerization (Docker) and Kubernetes, and the model was deployed on a single machine. There were two clear downsides to the initial deployment approach. First, I could not easily deploy the ML model and its pipeline on another machine. Because I had to install Type4Py and its dependencies on other machines, Second, the ML application could not be scaled well since a single machine’s resources are limited.
Over the past decade, machine learning (ML) has been applied successfully to a variety of tasks such as computer vision and natural language processing. Motivated by this, in recent years, researchers have employed ML techniques to solve code-related problems, including but not limited to, code completion, code generation, program repair, and type inference.
Dynamic programming languages like Python and TypeScript allows developers to optionally define type annotations and benefit from the advantages of static typing such as better code completion, early bug detection, and etc. However, retrofitting types is a cumbersome and error-prone process. To address this, we propose Type4Py, an ML-based type auto-completion for Python. It assists developers to gradually add type annotations to their codebases. In the following, I describe Type4Py’s pipeline, model, deployments, and the development of its VSCode extension and more.
In this post, I share my thoughts on the importance of writing skill in Academia. Currently, I’m a PhD student. So a large part of my job is to write papers and a thesis at the end. I don’t consider myself a good writer by any means, though I got compliments about my writing from some people. This post is useful mostly for novice researchers like PhD students and also people who want to consider a research career in Academia.
I’m back again with another post after a long time! I’m currently a second-year PhD student in software engineering at TU Delft. Since last summer, I wanted to write a blog post for giving some tips to PhD students. You might now call me a “lazy person”!! 😀 but I tweeted some short tips a while ago. However, on Twitter, it’s not possible to explain things and give examples. I’m also writing this post to remind myself of the below tips, given that I haven’t finished my PhD yet. Without further ado, let’s talk about the tips:
Nowadays, most people use scikit-learn for machine learning projects. Because scikit-learn is a top quality ML package for Python and lets you use a machine learning algorithm in several lines of Python code, which is great!
As a machine learning researcher, I personally like to try and use other machine learning libraries. It’s good to have knowledge of other ML libraries in your arsenal. Since I used C++ for my projects, I decided to try a C++ machine learning library.
To stay up-to-date in your field of research or study, you should read research papers. However, reading a research paper is not like reading a newspaper. Because a paper has a formal structure that consists of several sections. Authors of research papers know the structure quite well. It is also essential for readers to be familiar with the main sections of a research paper. This helps readers to quickly find the information they are looking for in a research paper. Moreover, it helps them comprehend new ideas and methods of a paper better. Aside from gaining new knowledge, reading research papers help you find out what was done in the past to solve a particular problem. Therefore, you are not going to reinvent the wheel and probably implement a method or algorithm that is proposed in a paper.
In this post, I explain the components of each section in a research article. Also, examples from a real and open-access research paper in machine learning are provided to help you understand the components of each section.
Currently, many people want to learn about Machine Learning. Because they see fancy and intelligent things in the media from big tech companies. To learn about this appealing subject (Machine Learning), there are numerous textbooks and tutorials out there. However, machine learning textbooks are often more than 500 pages. Also, these books are written for the technical audience. That is those readers who have a degree in Computer Science, Mathematics or Engineering. Even CS graduates often find some topics of Machine Learning hard to grasp.
Recently, I’ve introduced the LightTwinSVM program on my blog (If you haven’t read it, check out this post.). It is a fast and simple implementation of TwinSVM classifier. Some people might ask why I should use this program over other popular SVM’s implementation such as LIBSVM and scikit-learn. The short answer is that TwinSVM has better accuracy than that of SVM in most cases.
In order to show the effectiveness of the LightTwinSVM program in terms of accuracy, experiments were conducted on 10 UCI datasets benchmark datasets.
Support Vector Machine (SVM) is a popular and state-of-the-art classification algorithm. Hence many packages and implementations of standard SVM can be found on the internet. However, there are some interesting extensions of SVM that has a slightly better prediction accuracy. Of these extensions, Twin Support Vector Machine (TSVM) has received more attention from scholars in the field of SVM research. Even I myself have published a classifier based on TSVM and KNN.
TSVM does classification using two non-parallel hyperplanes as opposed to a single hyperplane in standard SVM (To know more about TSVM, you can read this blog post.). Unlike SVM, TSVM had almost no fast and simple implementation on the internet prior to 2018. So I decided to develop the LightTwinSVM program and share it with others for free.
Support Vector Machine (SVM) was proposed by Vapnik and Cortes in 1995 . It is a very popular and powerful classification algorithm. The main idea of SVM is to find an optimal separating hyperplane between two classes. Due to SVM’s great classification ability, it has been applied to a wide variety of applications.
Over the past decade, scholars have proposed classifiers on the basis of SVM. Among the extensions of SVM, I’d like to introduce Twin Support Vector Machine (TSVM) . Because it has been received more attention.