Industrial Applications: Big Bird is new approach to solve different sequence modeling problems such as:
Problem statement: Why Big Bird, when we have Bert, Elmo and their variants, Current Transformers-based models, such as BERT uses self-attention architecture and they have been one of the most successful deep learning models for NLP problems. Unfortunately, one of their core limitations is the quadratic dependency (mainly in terms of memory) on the sequence length due to their full attention mechanism.
In simple words, self-attention applies series of word-level comparisons for the power of 2 combinations…
Applying ML at Scale