The Penn Library Data Farm is a repository of quantitative information developed to aid the measurement and assessment of library resource use and organizational performance. In its design, this repository is multipurpose, providing space to assemble, process, integrate, analyze, and disseminate data. Functions of the Data Farm include:

  • maintaining and organizing raw and processed log files from web and other kinds of servers

  • developing and executing custom programs that parse and extract transaction logs for data processing

  • running custom programs to generate and update statistics on digital and physical library use

  • generating specialized web use reports with off-the-shelf log analyzer software

  • documenting data definitions, scripts and other data processing methods

  • providing and maintaining web access to raw and processed data sets that can be used for detailed analysis by librarians who need quantitative information in their work

  • accessing, by a secure means, vendor-generated reports on the Penn community's use of licensed resources.

See the Schematic View for further details about the repository architecture.

The Data Farm project follows these basic design principals:

  • Reports must be self-executable
    All the raw and processed files present in the Data Farm are built up automatically using scripts that run on defined sc hedules without direct human mediation

  • A uniform set of data processes must govern summary statistical output for consistency across reports and over time

  • Wherever possible the repository must make available semi-processed or raw data files for staff who wish to study use in greater detail than the summary reports provided in the Data Farm. A space–the Shared File Depository–is provided to post and exchange ad hoc reports using Data Farm resources, vendor reports or other data sets

  • The site must preserve the confidentiality of library patrons in processed reports and published data files

  • The library community will be provided open access to scripts and other nonproprietary tools used to build and maintain the repository. See the Script Library for further information.

The Data Farm is a utility for staff who need quantitative information to manage resources, improve service, and assess library performance and impact. It is not a static warehouse of figures, but a more dynamic program that, to the greatest possible extent, equips staff to analyze and assess their work independently. The development of this site and the manner of its presentation are motivated by the need for empirical data that support planning and the acheivement of goals, and will evolve according to this principal.