Women in the Copyright System


The U.S. Copyright Office has issued a new report, Women in the Copyright System: An Analysis of Women Authors in Copyright Registrations from 1978 to 2020. The report examines women’s authorship rate in the U.S. copyright system, with a comparison to their participation in the copyright-based creative industries. In connection with the release of this report, the Office is also providing the reference data set of copyright registration records from 1978 to mid-2021.


Report Summary

The report reveals a complex and evolving picture of women’s participation in the creative professions and their use of the copyright registration system. While the annual number of registrations has remained relatively stable since 1978—the effective date of the current U.S. Copyright Act—the share of registrations that list women authors has risen across nearly every category. In 2020, over 38 percent of all copyright registrations were granted to women authors, as compared to 28 percent in 1978. Authorship shares vary significantly among registration categories. Some categories, notably nondramatic literary works, have reached or are moving toward gender parity, while others demonstrate moderate but sustained growth in female authorship.


Copyright registration data and occupational data exhibit a positive relationship: occupations with higher shares of women participants have higher shares of registrations listing women authors. Nevertheless, with limited exceptions, the share of women authors in registrations is substantially smaller than women’s participation rate in corresponding occupations, but this gap is shrinking. Additional outreach and education may assist in closing the gap.


This report contributes to ongoing domestic and international discussions on the gender gap in the use of intellectual property (IP) systems. The Office intends to launch additional initiatives responsive to the report findings.


Data Set

In connection with the release of the Women in Copyright System report, the Office is providing a registration reference data set, as described below.


  • Coverage: The data set contains roughly 20 million copyright registration records, from Jan. 1, 1978, to July 8, 2021.

  • Content: The data set contains information on authors, types of works registered, publication status, and other relevant copyright registration information. More detailed descriptions of the fields, variables, and definitions can be found in the Library of Congress Copyright Data as Distributed in the Marc Format document available here.

  • The data are available in twenty-five file parts, numbered sequentially, and include document and registration records. Registrations and document records for any variable or year may be present in multiple file parts. Individual file parts may not contain all available records pertaining to any specific variable and may not be suitable for analysis independent of the full data set.

  • Format: Records are available to download using Chrome or Edge, and work is in progress to see if they may be made available for download through other browsers.

  • Please note, the files may not be fully readable by XML readers. It is suggested to treat the content of the XML files as simple text and then parse the text and convert it into a standard dataset with one observation per row and one variable per column. As of the date of the release, a set of Stata .do files that perform this function are in beta testing, and available by request by emailing [email protected]. Researchers should independently evaluate the quality of transformed dataset relative to the original content of the XML files.

  • The Copyright MARC 21 record structure is an implementation of the international standard Format for Information Exchange (ISO 2709) and its American counterpart, Bibliographic Information Interchange (ANSI/NISO Z39.2). Additional information on the MARC Standards is available here.

  • Frequency: The data set is static and reflects copyright registrations from January 1, 1978, to July 8, 2021. Additional updates are not planned at this time. The time frame used in the report was January 1, 1978, to December 31, 2020.

  • Methods: The information contained in the data set derives from registration public records. The information has been extracted in bulk from the public catalog and existing records material.

  • Disclaimer: This data set does not replace or supersede the online public catalog or existing search practices established by the U.S. Copyright Office, and the data set should not be relied on for legal matters. For information on searching copyright records, please refer to How to Investigate the Copyright Status of a Work (Circular 22). For information regarding requests to remove personal information from Copyright Office public records, please refer to Privacy: Copyright Public Records (Circular 18).