Project Proposal

Milestones · 29 Sep 2019

Here, we briefly describe the project proposal as given by the guidelines.

Title:	Parallelized PCYK Algorithm for Syntactic Parsing
URL:	https://thepulkitagarwal.github.io/pcyk/
Members:	Ansong Ni (ansongn) & Pulkit Agarwal (pulkitag)

Summary

In this project, we aim to parallelize the PCYK parsing algorithm for syntactic parsing on Nvidia GPUs (for CUDA) and Multicore Machines (for OpenMP / Cilk).

Background

The Probabilistic Cocke-Kasami-Younger algorithm (PCYK¹) algorithm is a dynamic programming algorithm for parsing sentences with Chomsky Normal Form (CNF).

An illustration of the PCYK algorithm for syntactic parsing for a natural language sentence “Fish people fish tanks” is shown like in the figure below.


Figure 1: An example of the PCYK algorithm for syntactic parsing²

It is a bottom-up algorithm that iteratively builds up the binary parse tree from the words as terminal (i.e. leaf) nodes. The appearance binary and unary parse rules are counted during the training time so it could be used to estimate the probability of a parse tree at test time. After performing the PCYK parsing and obtaining the parsing chart, the best parse tree is generated through a greedy search of the most probable child from the root node.

Different parts of this algorithm can benefit from parallelism, which we will describe with more detail in the challenges section.

The Challenges

The dependencies for computation are rather complicated. Thus we are left with plenty of opportunities to design the order of computation to maximize parallelism. Since it is a bottom-up algorithm, computation for the upper level tree nodes depend on some of the lower level tree nodes. Moreover, the completion of nearby tree nodes makes it possible to start computation for upper level nodes that only depend on those nodes, without waiting for the whole level to complete.
The workload from level to level is not uniform, which makes it challenging for us to design a strategy for load-balancing during the parsing.
The initial training procedure involves counting the appearance of production rules which can be done in many ways, but we need to consider the trade-off between speed, memory and accuracy across various parallel paradigms (i.e. with or without locks, storing the count in shared memory or local memory, etc).
The parsing process on a test set can be trivially parallelized by data parallelism with each thread keeping a copy of the probability for parse rules. However, when considering different parallel architecture (i.e. CUDA, OpenMP, etc), we need to investigate whether this strategy is actually beneficial due to bandwidth and memory constraints.
There are multiple extensions to PCYK algorithm, which improves the accuracy of the parsing but also makes the algorithm more complicated and thus harder to parallelize. So if time permits, we can investigate how to parallelize the algorithm with those tricks to make it fast and robust.

Goals

Plan to Achieve

For the CUDA Implementation of the PCYK Algorithm, the paper Efficient Parallel CKY Parsing on GPUs by Yi et. al. achieves 26x speedup. We aim to achieve a similar speedup on the GHC machines.

Hope to Achieve

Try to beat the 26X speedup on CUDA achieved by the paper.
OpenMP / Cilk Implementation of PCYK Algorithm.

Deliverables

Sequential Implementation of PCYK Algorithm (C++)
CUDA Implementation of PCYK Algorithm
(Hopefully) OpenMP / Cilk Implementation of PCYK Algorithm

Poster Session

Speedup Graphs
Live Demos / Outputs of our CUDA implementation (depending on the state of the GHC machines that day)

Platform Choice

We plan to use CUDA (GPU Architecture) and OpenMP / Cilk (Shared Memory Multicore Architecture).

Resources

For understanding the PCYK algorithm, we plan to use the CMU Algorithms for NLP course and the wikipedia article on CYK as reference.
For help with PCYK on CUDA, we will use the paper Efficient Parallel CKY Parsing on GPUs by Yi et. al. as a reference for our implementation.
We also will be using the CMU 15-418 Course as reference. More specifically, we will be using the CUDA Slides and Cilk Slides among others, and the CUDA Assignment and the OpenMP Assignment as well.

Hardware Resources

The main resource we would require are Nvidia GPUs (for CUDA) and Multicore Machines (for OpenMP / Cilk). We believe that the GHC machines we used for our assignments should be good enough for our project.

Planned Schedule

#	Dates	Plan
1	Oct 27 - Nov 02	Meeting with Course Instructors, Project Proposal
2	Nov 03 - Nov 09	Sequential Version of PCYK
3	Nov 10 - Nov 16	Cuda Version of PCYK
4	Nov 17 - Nov 23	Optimizations in the Cuda Version of PCYK
5	Nov 24 - Nov 30	Optimizations in the Cuda Version of PCYK
6	Dec 01 - Dec 07	Hope to Achieve Goals
7	Dec 08 - Dec 10	Final Report + Poster Session

The PCYK algorithm is also known as the CYK algorithm or the CKY algorithm in the NLP Community. ↩
Example borrowed from the slides of the CMU Algorithms for NLP course. ↩