Super Glue Benchmark and Glue:
The Natural language understanding-based intelligence testing. Humans Vs. Machine
The General Language Understanding Evaluation (GLUE) benchmark is a collection of resources for training, evaluating, and analyzing natural language understanding systems. Glue Dataset was compiled in 2019 and had 9 distinct tasks to gauge how well an AI model understands the human language constructs, including logic, negations, inference, and other traits a human candidate would use to choose correct responses.
Initial models were diverse, including RNN/Bi-Directional LTSM and CNN architecture-based deep learning model.
However, with the rise of BERT and GPT-based transformers architecture, a new and complex benchmark called Super Glue had to be established. Super Glue benchmark had more complex questions and tasks; only specially trained humans could do baseline score tasks. Here is a sample text for one of a series of tasks called “Multi-Sentence Reading Comprehension which is Part of Super Glue benchmark tests.
Super Glue Benchmark Sample Text: “Former prosecutor Michael Mazzariello was finally doing the kind of legal work he’d always dreamed of. Still, after less than a year of helping East New York’s poor, he’s getting booted from the bodega he turned into an office. Nearly a year ago, Mazzariello, a former assistant district attorney who grew up in East New York, started a nonprofit practice helping the working poor navigate the legal system. Immigration, landlord-tenant disputes and even criminal cases are the specialty of his East New York Legal Services Corp. on New Lots Ave. In a former bodega, the office was Mazzariello’s idea, and he got some help from high places early on. \”I picked up the phone and called Rudy Giuliani on his radio program,\” Mazzariello said. \”I said, ‘Mr. Mayor, we’re interested in renting space in a building the city owns. \”I swear, within an hour, the building was ours. We filled out all the paperwork. We got the nonprofit status from the feds. We were rolling.\” Refusing to charge clients, Mazzariello, 42, said he used his family’s savings to sustain the office during the first year. Already recognized as a federal nonprofit, the agency is awaiting state status that would allow it to survive on charitable donations. \”This is what I want to do – to give back to the community,\” said Mazzariello, who worked under Brooklyn District Attorney Charles Hynes from 1990 to 1993, followed by a stint as the Board of Education’s chief prosecutor. Under the city Housing Preservation and Development Department’s tenant ownership program, Mazzariello and partner Joe Guzzo learned they could rent to own. They invested $8,500 in a new facade, rest room makeover and other modest improvements. “
To get a correct score for the above text, the model must answer the following questions.
Question: “How did they get the building they were working in? did they buy it or rent it?” Question: “Why are Mr. Mazzariello’s law offices unable to operate without state non-profit status?”,
Question: “Did Mazzariello have any improvements done to the bodega he rented?”,
SuperGlue Test was introduced in 2021, the 2019 GLue Test had similar multi-line comprehension questions, but much more straightforward here is one Glue task to determine if the following two sentences are equivalent.
Sentence A: What have you decided, what will you do?
Sentence B: So, what’s your decision?
The link to the Super Glue Leaderboard snapshot and Highly trained and Human specialist contestant at #8 for Super Glue. For Glue, its is currently G at #23 https://super.gluebenchmark.com/leaderboard
Committed to delivering the best
Thousands of AWS and CNCF-certified Kubernetes solution partners have unique expertise and focus areas. Our focus is on best practices in security, automation, and excellence in Cloud operations.
Please reach out to us if you have any questions.