About

FL4NLP @ ACL 2022

Welcome to 1st FL4NLP Workshop co-located with ACL 2022!

Due to increasing concerns and regulations about data privacy (e.g., General Data Protection Regulation), coupled with the growing computational power of edge devices, emerging data from realistic users have become much more fragmented, forming distributed private datasets across different clients (i.e., organizations or personal devices). Respecting users’ privacy and restricted by these regulations, we have to assume that users’ data in a client are not allowed to transfer to a centralized server or other clients. For example, a hospital does not want to share its private data (e.g., conversations, questions asked on its website/app) with other hospitals. This is despite the fact that models trained by a centralized dataset (i.e., combining data from all clients) usually enjoy better performance on downstream tasks (e.g., dialogue, question answering). Therefore, it is of vital importance to study NLP problems in such a scenario, where data are distributed across different isolated organizations or remote devices and cannot be shared for privacy concerns.

The field of federated learning (FL) aims to enable many individual clients to jointly train their models, while keeping their local data decentralized and completely private from other users or a centralized server. A common training schema of FL methods is that each client sends its model parameters to the server, which updates and sends back the global model to all clients in each round. Since the raw data of one client has never been exposed to others, FL is promising to be an effective way to address the above challenges, particularly in the NLP domain where many user-generated text data contain sensitive, personal information.

Topics of interests include but not limited to

  • Federated learning methods for NLP tasks and models (e.g., Transformer-based LMs, dialog systems, etc).
  • New learning frameworks to tackle data heterogeneity, label deficiency, data shift, generalization ability related issues in FL for NLP, including continual learning, multi-task learning, self/semi/un-supervised learning, etc.
  • Efficient training methods for resource-constrained on-device NLP, including training-time compression, communication/computation/memory-efficient methods.
  • Security and privacy for FL for NLP, including new attack methods (e.g., data and model poisoning) and defense methods (e.g., empirical and certifiable defenses), robust aggregation methods, differential privacy (DP), HE (Homomorphic Encryption), etc.
  • Fair FL for NLP, including introducing different fairness notions in FL, mitigating different types of biases in different NLP applications in FL settings, introducing benchmark datasets and tasks for fair FL in NLP applications along with auditing different NLP applications in FL settings.
  • Interpretability of FL for NLP, especially understanding how NLP models work in data heterogeneity.
  • Scalability of FL4NLP: e.g., client sampling algorithms.
  • Benchmarking datasets (with realistic non-I.I.D. partitions) and new applications in NLP and beyond.

Program

Workshop Program


Please join us on Slack (#acl2022-fl4nlp-workshop).
Underline: LINK.
Zoom: https://us06web.zoom.us/j/87819510674?pwd=UEVleW5JSVpkUXJBQlR2TEtRV1hRZz09
Start Time - Dublin Time (Pacific Time) on May 27, 2022
09:00AM (1:00AM) Introduction and Opening Remarks
09:10AM (1:10AM) Invited Talk #1: Manzil Zaheer: "Federated Optimization in NLP"
10:05AM (2:05AM) Invited Talk #2: Rahul Gupta: "Federated learning for industrial systems"
11:05AM (3:05AM) Invited Talk #3: Salman Avestimehr: "Secure, Scalable, and Efficient Federated Learning"
12:00 (4:00AM) Lunch Break
12:35 (4:35AM) Contributed Talk #1: Backdoor Attacks in Federated Learning by Poisoned Word Embeddings
12:50 (4:50AM) Contributed Talk #2: Efficient Federated Learning on Knowledge Graphs via Privacy-preserving Relation Embedding Aggregation
13:05 (5:05AM) Contributed Talk #3: Pretrained Models for Multilingual Federated Learning
13:20 (5:20AM) Contributed Talk #4: Adaptive Differential Privacy for Language Model Training
13:35 (5:35AM) Invited Talk #4: Tong Zhang: "On the Unreasonable Effectiveness of Federated Averaging with Heterogeneous Data"
14:30 (6:30AM) Invited Talk #5: Bo Li: "Trustworthy Federated Learning"
15:25 (7:25AM) Invited Talk #6: Virginia Smith: "Privacy Meets Heterogeneity"
16:20 (8:20AM) Contributed Talk #5: ActPerFL: Active Personalized Federated Learning
16:35 (8:35AM) Contributed Talk #6: Scaling Language Model Size in Cross-Device Federated Learning
16:50 (8:50AM) Contributed Talk #7: Intrinsic Gradient Compression for Scalable and Efficient Federated Learning
17:05 (9:05AM) Contributed Talk #8: Training a Tokenizer for Free with Private Federated Learning
17:20 (9:20AM) Contributed Talk #9: UserIdentifier: Implicit User Representations for Simple and Effective Personalized Sentiment Analysis
17:35 (9:35AM) Panel Discussion: Tom Diethe, Xu Zheng, Gauri Joshi, Anna Rumshisky
18:30 (10:30AM) Concluding Remarks

Accepted Papers

Accepted Papers

Archival Papers
Non-Archival Papers

Calls

Call for Papers

Important Dates (Updated!!!)
  • Regular submission deadline (never-published work): Feb 28 Mar 7, 2022
  • ARR submission deadline (submissions with ARR reviews): Mar 21, 2022
  • Notification of Acceptance (for regular and ARR submissions): March 26, 2022
  • Published submission deadline (published at other venues): April 8, 2022
  • Camera-ready papers due: April 10, 2022
  • Workshop Dates: May 27
  • All deadlines are AoE time.
Submission Instructions

We solicit two categories of papers.

Workshop papers (regular/ARR): describing new, previously unpublished research in this field. The submissions should follow the ACL-ARR style guidelines. We accept both short (4 pages of content) and long papers (8 pages of content). Submissions will be subject to a double-blind review process (i.e. they need to be anonymized). Final versions of accepted papers will be allowed 1 additional page of content so that reviewer comments can be taken into account.
Please fill this google form to submit your papers: https://forms.gle/AToY6HYZ6buydSVv5

Published papers: papers on topics relevant to the workshop theme, previously published at NLP or ML conferences. These papers can be submitted in their original format without hiding the author names. Submissions will be reviewed for fit to the workshop topics.

In both categories, accepted papers will be:
  • can be non-archival
  • published on the workshop website
  • presented at the workshop as a lightning talk

Cross-submission Policy: As long as it doesn't conflict with the cross-submission policies of the other venue (e.g., the ARR policy), you may submit the paper to CSRR as a regular workshop paper. Please feel free to email us if you are not sure about your case!

Please submit your paper via Openreview: https://openreview.net/group?id=aclweb.org/ACL/2022/Workshop/FL4NLP

Amazon Best (Student) Paper Awards
There will be a best paper award and a best student paper award for honoring exceptional papers published at the FL4NLP workshop, which are both sponsored by Amazon Alexa AI.

Talks

Invited Speakers

Salman Avestimehr

Professor at University of Southern California

Virginia Smith

Assistant Prof. at CMU

Bo Li

Assistant Prof. at UIUC

Tong Zhang

Professor at HKUST

Manzil Zaheer

Research Scientist at Google DeepMind

Rahul Gupta

Applied Science Manager at Amazon Alexa

Panel Discussion

Panelists

Tom Diethe

Amazon Research

Xu Zheng

Google FL team

Gauri Joshi

Assistant Prof. at CMU

Anna Rumshisky

Associate Prof. at Umass

Organization

Workshop Organizers

Bill Yuchen Lin

PhD Candidate @ USC

Chaoyang He

PhD Candidate @ USC

Chulin Xie

PhD Student @ UIUC

Fatemehsadat Mireshghallah

PhD Candidate @ UCSD

Ninareh Mehrabi

PhD Candidate @ USC-ISI

Tian Li

PhD Student @ CMU

Mahdi Soltanolkotabi

Associate Prof. @ USC

Xiang Ren

Assistant Prof. @ USC

Program Committee

  • Hongyuan Zhan (Facebook)
  • Anit Kumar Sahu (Amazon Alexa AI)
  • Bahareh Harandizadeh (University of Southern California)
  • Basak Guler (University of California, Riverside)
  • Dimitris Stripelis (University of Southern California)
  • Eugene Bagdasaryan (Cornell University)
  • Farzin Haddadpour (Yale University)
  • Gerald Penn (University of Torontoy)
  • Hongyi Wang (Carnegie Mellon University)
  • Jinhyun So (University of Southern California)
  • Jun Yan (University of Southern California)
  • Kshitiz Malik (Facebook)
  • Kevin Hsieh (Microsoft)
  • Ninareh Mehrabi (University of Southern California)
  • Roozbeh Yousefzadeh (Yale University)
  • Saurav Prakash (University of Southern California)
  • Shen Li (Facebook)
  • Shengyuan Hu (Carnegie Mellon University)
  • Sijie Cheng (Fudan University)
  • Sunwoo Lee (University of Southern California)
  • Tao Yu (The University of Hong Kong)
  • Umang Gupta (University of Southern California)
  • Xin Dong (Harvard University)
  • Xuechen Li (Stanford University)
  • Yae Jee Cho (Carnegie Mellon University)
  • Zheng Xu (Google)

Sponsors

Sponsors

FedML          FedML           FedML      FedML     

Contact us

Email us at fl4nlp@googlegroups.com
Join our Slack Channel for more discussion!