Public Access and Data Sharing August 2017

>>Elizabeth Albro: Thank
you for listening to this recorded webinar on IES
Public Access and Data Sharing. I am Elizabeth Albro from
the National Center for Education Research, NCER,
and I am joined by my colleague.>>Kimberley Sprague: Kim
Sprague from the National Center for Special Education
Research, NCSER or NCSER.>>Elizabeth Albro: We are
going to start with one slide about IES, but
please refer to “New to IES” recorded webinar for
more information about IES, and briefly, to the
background regarding the IES data access policies,
and then we will present an overview for today. So, what is IES? IES is the research
arm of the U.S. Department of Education. We are nonpartisan by law,
and we are created by the Education Sciences
Reform Act in 2002. Under ESRA, we are charged
with providing rigorous and relevant evidence on
which to ground education practice and policy and to
share this information broadly. Our hope is that by
identifying what works, what doesn’t, and why, we
will be able to improve educational outcomes for
all students, particularly those at risk for failure. The prior slide
illustrates our mandate to share our findings and
illustrates the importance of our public access
policies to support those in the field of education
and to achieve our goals. Given IES’s commitment
to sharing what we are learning from the research
that we fund, IES implemented a Public
Access Policy in 2012. This policy required that
all grantees submit their peer reviewed scholarly
publications to the Department’s electronic
library, ERIC, and required that a small
subset of funded grantees prepare data management
plans to share their data with the public,
ensuring, of course, that all the data would be
appropriately identified and privacy concerns
addressed in describing how they would ensure how
the data were used for research purposes. And then, in February
2013, Dr. John Holdren, the Director of the Office
of Science and Technology Policy, issued a memorandum
to all federal agencies. This memorandum,
“Increasing Access to the Results of Federally
Funded Scientific Research,” directed all
federal agencies with annual research
expenditures greater than $100 million to prepare
public access plans. These plans were to detail
how each agency would extend public access
requirements for both publications and data
resulting from federal funding to all
research awards. To implement this
ambitious agenda, the memorandum led to the
creation of an interagency working group on
public access. I served as one of three
Department of Education representatives
to this group. Over three and a half
years, the group worked diligently to brainstorm
common solutions and to share resources
and expertise. Our colleagues at the
National Institutes for Health, for example,
provided substantial leadership to the group as
they have spent more than a decade working to
implement their public access policies around
publications via PubMed. The ED Plan was approved
on October 21st, 2016, joining the other agencies
that have established development plans across
the federal government. The IES requirements were
then updated as a result to align with the
final ED Plan. Note that the IES policy
and the ED Plan can both be found on our website
at The full link is
/researchaccess.asp. There are two components
of the IES Public Access Policy, one for research
publications and another for research data. Each of the full policy
documents can be found on our website. Look at the bottom of our webpage
under “IES Policies and Standards” for Public
Access Policy. These policy documents
provide an overview of the IES expectations for
providing public access to the scholarly publications
and the data resulting from the work that we fund
as well as more detailed information regarding
these requirements. You can view, download, or
print the full plans as a PDF file from the website. We’re now going to move
to a discussion of the requirements and we’ll
begin with the IES research publication
requirements. PIs are expected to submit
their manuscripts to ERIC, the department’s digital
library, as soon as a manuscript is accepted
for publication. The final approved
manuscript must be submitted to ERIC on
ERIC’s submission site, including information
about which grant or grants provided funding to
support the research that was reported. This manuscript is then
made public only after the officially agreed upon
embargo period has passed. For most journals, this
time period is 12 months after the publication
of the manuscript. By law, the department is
required to make public all manuscripts that
have been accepted for publication no later
than 12 months after the publication time. Although the slide above
indicates the minimum requirement, in practice,
IES expects its grantees to submit all of their
peer reviewed scholarly publications emerging
from IES funding to ERIC. Thus, we expect posting of
any accepted publications that describe what has
been learned, such as a description of an
iterative development process in addition to the
findings from an initial pilot study, an
implementation study published separately
from the results from an efficacy or effectiveness
study as well as validation findings from
a measurement study. In addition, although many
research firms with IES contracts have their
results published or disseminated by IES, and
those publications are also included in ERIC, any
peer reviewed scholarly publications that are
published outside of IES are still required to
be submitted to ERIC. So note, the authors final
manuscript is defined as the final version accepted
for journal publication and includes all
modifications from the peer review process. It is not the final
preprint that you receive for copy editing purposes. In addition, I’d like
everyone to know that we are encouraging all
researchers who received grants from IES prior
to 2012 to submit their publications from IES
funded grants and contracts to ERIC as well. Our goal is to have all of
the findings and all of the knowledge that has
been generated from IES-funded research to be
available to the public via ERIC. Beginning with awards made
in Fiscal Year 2013, IES established a requirement
for data sharing for Goal 4 Effectiveness Grants
and this requirement was extended to include Goal 3
Efficacy and Replication Grants in Fiscal
Year 2015. This includes, again,
at a minimum, the final research data used to
answer the primary research questions the grantee
received funding to answer. In addition, in Fiscal
Year 2016, IES expanded the requirement for data
sharing, and included it in the Research Networks
on Critical Problems of Policy and Practice as
well as in the National Research and Development
Center’s competitions for Fiscal Year 2018 that
were just released. Our intent is to have
these requirements be extended to most, if
not all, of the funded projects that we
support going forward. As part of the data
component of this policy and these requirements,
principal investigators are required to provide
access to an electronic version of their
final research data. The final research data is
defined as the recorded factual material commonly
accepted in the scientific community as necessary
to document and support research findings. Note that the final
research data does not mean summary statistics
or tables, but rather the factual information which
the summary statistics and tables are based. For the purposes of this
policy, please know that final research data do
not include laboratory notebooks, preliminary
analyses, drafts of scientific papers, and
plans for future research, peer review reports or
communications with colleagues. For most studies, an
electronic file will constitute the final
research data. It will include the final
clean data and it may include both original data
and derived variables which will be fully described
in accompanying documentation. The data must ensure that
other researchers can replicate the final
analysis and findings published for the grant. Again, please note that
data can be made available earlier as it
is appropriate. Researchers are encouraged
to share data that will inform the field more
broadly than may be feasible only through
published studies. As noted on the prior
slides, IES established data sharing requirements
for Goal 4 and Goal 3 studies; for studies that
were engaged in testing a causal impact as well as
for certain other research programs as specified in the relevant Requests
for Applications. And these requirements
must be met in applications
submitted to IES. The way this requirement
must be met is to include a data management plan
in your application. Compliance with the IES
data sharing requirements is expected even though
the final data set may not be completed and prepared
for data sharing until after the grant
has been completed. In cases where the PI or
grantee is noncompliant with the requirements of
data sharing policy or the data management plan,
subsequent awards to individuals or
institutions may be affected. Grantees are required to
provide access to the final research data from
grants in a timely fashion and no later than the time
of publication in a peer reviewed scholarly
publication. In providing public access
to data, researchers must protect the rights and
privacy of human subjects at all times. This information should be
included in the DMP as an appendix to the
application, and please refer to the RFA where
there are Requests for Applications for more
elaborated explanations and details about
what’s required.>>Kimberley Sprague: Data
documentation should be a comprehensive and
stand-alone document that includes all the
information necessary to replicate the analysis
performed by the original research team. This documentation should
be described in detail in your DMP. There are several types of
documentation in order to describe how someone could
use these data and to support discoverability. For anyone to make use
of the data, a detailed description must
be included. And this not only involves
the formatting of the data, but also information
about the research design and the timing of data
collection methods (things that may impact someone’s
ability to use the data to answer their own
research questions). In addition, all of this
information supports the ability to replicate the
results that were reported using the final data. What is most important is
that researchers document everything and strive
to make notation as interpretable as possible. Documentation for
discovery involves the capacity of individuals
to find these data via an internet search. The provision of metadata
enhances this ability and these metadata provide
information about the dataset to help people find,
understand, and use these data. There are emerging
standards in the field about metadata, but
currently there is not a set of standards for
this in education research. The publisher assigns a
digital object identifier, or known as a DOI, when
your article is published and made available
electronically. Depending on the structure
of the data and the file, this information can be
embedded directly into the data file, or it can be
included as a separate file. And these DOI’s are
important to include. They really help people
discover where your data are. We have talked
already about grantee requirements, what
constitutes final data, and the meaning
of public access. Applicants have asked for
clarification about these requirements over time
since we began asking for people to submit data
management plans for our Goal 3s and our Goal 4s. So, one example is the
clarification related to public access. We have no requirement for
open access to the public. A grantee can require
restricted access but we need to have it explained
to us why that’s necessary. So, we’re going to cover
a few additional concerns that have arisen since
the inclusion of DMPs in our application. We’ve had some questions
about researcher access and ownership of the data. And while the federal
government retains certain rights, data used in
federally funded grants are considered to be
products of the grant as per our EDGAR regulations. IES does acknowledge
that there may be issues associated with providing
access to data when the data collected
are proprietary. For example, when a
published curriculum is being evaluated. While researchers and
institutions have access to their own data over
time, the final data should be maintained for
10 years and provided to the public when requested
or provided via a public access system. Discuss these requirements
with your institution and identify where they may
be issues to discuss with your IES program officer. Any restrictions on
data-sharing such as a delay of disclosing
proprietary data should be presented in the DMP, and
those will be discussed with your program
officer as well. If proprietary issues
emerge during the course of the research, they
should also be brought to the attention of you IES
program officer, and the DMP can be reviewed in
light of these issues. IES requires 10 years of
access for the final data, as we’ve mentioned, and
the available methods can include the investigator
and institution taking on the responsibility for the
data sharing; or the use of a data archive or data
enclave that would mean providing data that were
public after the duration of the grant and accessible
through those methods; or three, use of some
combination of these methods for what’s
available publicly versus what might be a
restricted access. The resources for
maintaining the data can be obtained from the
institution or the grant or a combination as well. The individuals providing
the access have to be identified at the
university or institution level. They cannot be just the PI
or Co-PI because people can change institutions,
and ultimately, it’s the institution’s
responsibility for the grant and providing
the access. So, a data archive is a
place where data can be stored and distributed to
the scientific community for further analysis. Data archives typically
require extensive data documentation and work
to ensure privacy and confidentiality
standards are met. Data archives can be
particularly attractive for investigators
concerned about a large volume of requests,
vetting these requests, and providing technical
assistance to users. A data enclave provides
a controlled, secure environment in which
eligible researchers can perform analysis using
restricted data resources without downloading
the data to their own computer. Researchers can use a data
archive or enclave when data sets cannot be distributed
to the general public. For example, because of particiant
confidentiality concerns, third party licensing
or user agreements that prohibit redistribution. However, if any of these
concerns arise in your developing your DMP,
please contact the program officer to talk through
the requirements and how they can be met. Next, we will go back to
Liz to talk about the disclosure risks for data.>>Elizabeth Albro: Another
set of questions that often come up are around
disclosure risks. What are they and how do
you plan to prevent them to the degree that
you can in your DMP? So, a disclosure risk is a
circumstance that provides all or some of the data
from being shared. And much of this is
defined through FERPA and other privacy
laws and guidance. However, please note that
it’s not a barrier to the use of de-identified data. So, here are
some examples. Researchers conducting
research at a group level that is at a school or a
classroom, for example, may note that their
numbers are too few. And therefore, their cases
could be identifiable, right? So, if you have a
particular population of students that you’re
working with and there are only three of those
students in a classroom, then it’s likely that
someone could infer back who those particular
students are. If that is true, then
there is a disclosure risk and it would be improper
and, in fact, illegal to share that data with
other individuals. In addition, disclosure
risk might happen if researchers are naming
schools or districts that they are working with in
their applications and note that the data from
these schools or districts can be identified. So, something to consider
as you’re pulling your plan together is the
degree of specificity in terms of specifying the
actual schools you’re working with or if you
want to keep it at the general district level. In addition, sometimes
schools and districts might raise concerns that
sharing the data will violate FERPA or other IRB
requirements, and so one of the things that we need
to do is we work with PIs to help them understand
that in fact in most cases, sharing data
is not a violation of FERPA, nor of their IRB,
but there are things that you need to have in place
in order to make sure that you meet those
requirements. So, for example, FERPA
allows for schools and districts to provide
access to student data for research purposes. The “Dear Colleague
Letter” about the Family Educational Rights and
Privacy Act, which is the full name for FERPA, final
regulations includes the following language in the
section titled, Release of Data. I’m going to read you this
quote, “While FERPA is a privacy statute and not
a research statute, it should not be a barrier
to conducting useful and valid educational research
that uses the identified student data. Educational agencies and
institutions are permitted to release, without
consent, education records or information from
education records that have been de-identified
through the removal of all personally identifiable
information. The final regulations
amend the definition of personally identifiable
information and offer guidance to educational
agencies and institutions, as well as state
educational authorities; on determining how to
de-identify information. The final regulations also
identify factors that should be considered before
releasing this information.” So, what does this mean
for you as a researcher preparing your DMP? I think what this means is
that you need to be well versed in the final
regulations around FERPA. So that if a school or
a district puts some roadblocks up against you
collecting the type of data that you’re proposing
to collect or then, and then sharing it
afterwards, you have complete information about
what the restrictions are and are not. So, people need to be able
to recreate or replicate the analyses and results
that are published. But they are also
interested in using and exploring the data to
answer new research questions. So, how can a PI create
a data set that allows people to answer these new
questions, but at the same time prevents or limits
disclosure risks? So some of the solutions
that we’ve come up with so far, and I’m sure there
are others, include things like de-identifying
a portion of the data using data repositories
and archives, and we’ll talk more about that in
the next slide, creating memorandum of understanding
with the school, the district, the state
partners, and include in these agreements an IRB and
informed consent documentation. PIs have also proposed
developing different processes by which
researchers can apply to have access to the data or to
a restricted form of the data. One other solution is to
engage a partner, a state, or a district to serve
in a gatekeeping role. Another possible solution
is for the investigators to develop a mixed method
or a differentiated method, if you will, for
data sharing that allows more than one version of
the data set to exist and to be shared, and it
provides different levels of access depending
on the version. For example, a redacted
dataset could be made available for general use,
but stricter controls through a data archive
or an enclave would be applied if access to more
sensitive data was required. The key point, as you’re
thinking through this balance of sharing data
and minimizing disclosure risks, is to know that
researchers must always protect the rights and
privacy of human subjects. If you continue to have
questions about how to manage disclosure risks
and collecting data we wanted to make sure that
you were aware of the Privacy Technical
Assistance Center, or the PTAC, help desk. This is the department
location where stakeholders can ask questions
to matters of privacy, confidentiality, and
security of education data. In addition, state
education agencies, local education agencies and
postsecondary institutions can request, free of
charge, onsite technical assistance to learn about
best security practices for protecting
education data systems. Your program officer is
also a resource, and the IES website has current
materials and will continue to add additional
and updated materials as we are learning more about
this process, and I would encourage you to go back
there and explore and see what else, what other
information, what new information is available. Before we close we wanted
to make sure that you knew that sections of this
document were pulled from the ICPSR’s Guide to
Social Science Preparation and Archiving. The full guide can be
found at the following location:
files/icpsr/access/dataprep.pdf. If you have any other
questions or things you’d like Kim or myself to
address our emails are included below. [email protected]
and for Kim Sprague, it’s [email protected] You can also follow us on
Twitter, @IESresearch, or on Facebook, or just
go to our website There’s lots of
information there, and there will be more as we
learn more about this whole process.

Leave a Reply

Your email address will not be published. Required fields are marked *