Women in the Copyright System


The U.S. Copyright Office has issued a new report, Women in the Copyright System: An Analysis of Women Authors in Copyright Registrations from 1978 to 2020. The report examines women’s authorship rate in the U.S. copyright system, with a comparison to their participation in the copyright-based creative industries. In connection with the release of this report, the Office is also providing the reference data set of copyright registration records from 1978 to mid-2021.


Report Summary

The report reveals a complex and evolving picture of women’s participation in the creative professions and their use of the copyright registration system. While the annual number of registrations has remained relatively stable since 1978—the effective date of the current U.S. Copyright Act—the share of registrations that list women authors has risen across nearly every category. In 2020, over 38 percent of all copyright registrations were granted to women authors, as compared to 28 percent in 1978. Authorship shares vary significantly among registration categories. Some categories, notably nondramatic literary works, have reached or are moving toward gender parity, while others demonstrate moderate but sustained growth in female authorship.


Copyright registration data and occupational data exhibit a positive relationship: occupations with higher shares of women participants have higher shares of registrations listing women authors. Nevertheless, with limited exceptions, the share of women authors in registrations is substantially smaller than women’s participation rate in corresponding occupations, but this gap is shrinking. Additional outreach and education may assist in closing the gap.


This report contributes to ongoing domestic and international discussions on the gender gap in the use of intellectual property (IP) systems. The Office intends to launch additional initiatives responsive to the report findings.


Data Set

In connection with the release of the Women in the Copyright System report, the Office is providing a reference data set of public catalog data, as described below.


  • Coverage: The data set contains copyright registration records, copyright renewal records, and recorded document records, from January 1, 1978, to July 8, 2021.

  • Content: The data set contains information on authors, types of works registered, publication status, and other relevant copyright information. More detailed descriptions of the fields, variables, and definitions can be found in the Library of Congress Copyright Data as Distributed in the Marc Format document available here.

  • The data are available in twenty-six file parts, numbered sequentially, and include recorded document and registration records. Registration and recorded document records for any variable or year may be present in multiple file parts. Individual file parts may not contain all available records pertaining to any specific variable and may not be suitable for analysis independent of the full data set.

  • Format: Records are available to download as zipped XML files.

  • Please note, the files may not be fully readable by all XML readers. It is suggested to treat the content of the XML files as simple text and then parse the text and convert it into a standard dataset with one observation per row and one variable per column. A set of Stata .do files that perform this function are available by emailing [email protected]. Researchers should independently evaluate the quality of transformed dataset relative to the original content of the XML files.

  • The Copyright MARC 21 record structure is an implementation of the international standard Format for Information Exchange (ISO 2709) and its American counterpart, Bibliographic Information Interchange (ANSI/NISO Z39.2). Additional information on the MARC Standards is available here.

  • Frequency: The data set is static and reflects copyright registrations, copyright renewals, and recorded documents from January 1, 1978, to July 8, 2021. The timeframe used in the report was January 1, 1978, to December 31, 2020. The Office may update the data set occasionally to more accurately reflect activity within that period. Those who have downloaded an earlier data set are encouraged to update their records to the most recent update.

  • Version History:
  • June 9, 2022: Original data release1 [version 1.0]
    October 25, 2022: Revised data release [version 2.0]
    • About 3 percent of the data points revised to more accurately reflect the registration and recordation activity within the period covering January 1, 1978, to July 8, 2021.
    • Out-of-scope datafields removed.
    • File format changed to address download issues.
    • File and link naming conventions updated.

  • Methods: The information contained in the data set derives from Copyright Office public records. The information has been extracted in bulk from the public catalog and existing records material.

  • Disclaimer: This data set does not replace or supersede the online public catalog or existing search practices established by the U.S. Copyright Office, and the data set should not be relied on for legal matters. For information on searching copyright records, please refer to How to Investigate the Copyright Status of a Work (Circular 22). For information regarding requests to remove personal information from Copyright Office public records, please refer to Privacy: Public Copyright Registration Records (Circular 18).


1. Date when the U.S. Copyright Office first distributed the data set.