We present a light formalism for proofs that encodes their inferential
structure, along with a system that transforms these representations into
flow-chart diagrams. Such diagrams should improve the comprehensibility of
proofs. We discuss language syntax, diagram semantics, and our goal of building
a repository of diagrammatic representations of proofs from canonical
mathematical literature. The repository will be available online in the form of
a wiki at proofflow.org, where the flow chart drawing software will be
deployable through the wiki editor.
We introduce T2Ku, an open source project that aims at building a semantic
wiki of mathematics featuring automated reasoning(AR) techniques. We want to
utilize AR techniques in a way that truly helps mathematical researchers solve
problems in the real world, instead of building another ambitious yet useless
system. By setting this as our objective, we exploit pragmatic design decisions
that have proven feasible in other projects, while still employs a loosely
coupled architecture to allow better inference programs to be integrated in the
future.
Interactive visualizations for exploring and retrieval have not yet become an
integral part of digital libraries and information retrieval systems. We have
integrated a set of interactive graphics in a real world social science digital
library. These visualizations support the exploration of search queries,
results and authors, can filter search results, show trends in the database and
can support the creation of new search queries. The use of weighted brushing
supports the identification of related metadata for search facets.
This paper raises concerns about the advantages of using statistical
significance tests in research assessments as has recently been suggested in
the debate about proper normalization procedures for citation indicators.
Statistical significance tests are highly controversial and numerous criticisms
have been leveled against their use. Based on examples from articles by
proponents of the use statistical significance tests in research assessments,
we address some of the numerous problems with such tests.
To preserve access to digital content, we must preserve the representation
information that captures the intended interpretation of the data. In
particular, we must be able to capture performance dependency requirements,
i.e. to identify the other resources that are required in order for the
intended interpretation to be constructed successfully. Critically, we must
identify the digital objects that are only referenced in the source data, but
are embedded in the performance, such as fonts.
One is inclined to conceptualize impact in terms of citations per
publication, and thus as an average. However, citation distributions are
skewed, and the average has the disadvantage that the number of publications is
used in the denominator. Using hundred percentiles, one can integrate the
normalized citation curve and develop an indicator that can be compared across
document sets because percentile ranks are defined at the article level.
Concept Analysis provides a principled approach to effective management of
wide area information systems, such as the Nebula File System and Interface.
This not only offers evidence to support the assertion that a digital library
is a bounded collection of incommensurate information sources in a logical
space, but also sheds light on techniques for collaboration through coordinated
access to the shared organization of knowledge.
Citation distributions are so skewed that using the mean or any other central
tendency measure is ill-advised. Unlike G. Prathap's scalar measures (Energy,
Exergy, and Entropy or EEE), the Integrated Impact Indicator (I3) is based on
non-parametric statistics using the (100) percentiles of the distribution.
Observed values can be tested against expected ones; impact can be qualified at
the article level and then aggregated.
It is becoming common to archive research datasets that are not only large
but also numerous. In addition, their corresponding metadata and the software
required to analyse or display them need to be archived. Yet the manual
curation of research data can be di?cult and expensive, particularly in very
large digital repositories, hence the importance of models and tools for
automating digital curation tasks.
The h-index is a popular bibliometric indicator for assessing individual
scientists. We criticize the h-index from a theoretical point of view. We argue
that for the purpose of measuring the overall scientific impact of a scientist
(or some other unit of analysis) the h-index behaves in a counterintuitive way.
In certain cases, the mechanism used by the h-index to aggregate publication
and citation statistics into a single number leads to inconsistencies in the
way in which scientists are ranked.
The Mizar Mathematical Library (MML) is a rich database of formalized
mathematical proofs (see this http URL). Owing to its large size (it
contains more than 1100 "articles" summing to nearly 2.5 million lines of text,
expressing more than 50000 theorems and 10000 definitions using more than 7000
symbols), the nature of its contents (the MML is slanted toward pure
mathematics), and its classical foundations (first-order logic, set theory,
natural deduction), the MML is an especially attractive target for research on
foundations of mathematics.
We investigate how author name homonymy distorts clustered large-scale
co-author networks, and present a simple, effective, scalable and generalizable
algorithm to ameliorate such distortions. We evaluate the performance of the
algorithm to improve the resolution of mesoscopic network structures. To this
end, we establish the ground truth for a sample of author names that is
statistically representative of different types of nodes in the co-author
network, distinguished by their role for the connectivity of the network.
Two commonly used ideas in the development of citation-based research
performance indicators are the idea of normalizing citation counts based on a
field classification scheme and the idea of recursive citation weighing (like
in PageRank-inspired indicators). We combine these two ideas in a single
indicator, referred to as the recursive mean normalized citation score
indicator, and we study the validity of this indicator. Our empirical analysis
shows that the proposed indicator is highly sensitive to the field
classification scheme that is used.
Radicchi, Fortunato, and Castellano [arXiv:0806.0974, PNAS 105(45), 17268]
claim that, apart from a scaling factor, all fields of science are
characterized by the same citation distribution. We present a large-scale
validation study of this universality-of-citation-distributions claim. Our
analysis shows that claiming citation distributions to be universal for all
fields of science is not warranted.
Recent advances in methods and techniques enable us to develop an interactive
overlay to the global map of science based on aggregated citation relations
among the 9,162 journals contained in the Science Citation Index and Social
Science Citation Index 2009 combined. The resulting mapping is provided by
VOSViewer. We first discuss the pros and cons of the various options: cited
versus citing, multidimensional scaling versus spring-embedded algorithms,
VOSViewer versus Gephi, and the various clustering algorithms and similarity
criteria.
The paper introduces scholarly Information Retrieval (IR) as a further
dimension that should be considered in the science modeling debate. The IR use
case is seen as a validation model of the adequacy of science models in
representing and predicting structure and dynamics in science.
Mathematical knowledge is a central component in science, engineering, and
technology (documentation). Most of it is represented informally, and -- in
contrast to published research mathematics -- subject to continual change.
Unfortunately, machine support for change management has either been very
coarse grained and thus barely useful, or restricted to formal languages, where
automation is possible.
In this paper we present a model based on the principles of Linked Data that
can be used to describe the interrelationships of images, texts and other
resources to facilitate the interoperability of repositories of medieval
manuscripts or other culturally important handwritten documents. The model is
designed from a set of requirements derived from the real world use cases of
some of the largest digitized medieval content holders, and instantiations of
the model are intended as the input to collection-independent page turning and
scholarly presentation interfaces.
The creation of a next generation internet (semantic web) is impossible
without attributes, allowing the semantic association of documents and their
integration into information context. To achieve these goals, the Universal
Metadata Standard (ums) may be an ultimative tool, which could serve as a basis
for documentography, and is functionally required for interpretation of
documents by the automatic operating systems.
In contrast to many other scientific disciplines, computer science considers
conference publications. Conferences have the advantage of providing fast
publication of papers and of bringing researchers together to present and
discuss the paper with peers. Previous work on knowledge mapping focused on the
map of all sciences or a particular domain based on ISI published JCR (Journal
Citation Report). Although this data covers most of important journals, it
lacks computer science conference and workshop proceedings.
Multivariate linear regression models suggest a trade-off in allocations of
national R&D investments. Government funding, and spending in the higher
education sector, seem to encourage publications, whereas other components such
as industrial funding, and spending in the business sector, encourage
patenting. Our results help explain why the US trails the EU in publications,
because of its focus on industrial funding - some 70% of its total R&D
investment. Conversely, it also helps explain why the EU trails the US in
patenting.
We discuss the "rate of averages" versus the "average of rates" in the case
of the impact factor. Synchronous as well as diachronous journal impact factors
are sensitive to adding non-cited articles (to the denominator). This is a
consequence of basic properties of elementary arithmetic. Our findings provide
a rationale for not taking uncitable publications into account in impact factor
calculations, at least if these items are truly uncitable, that is, are never
cited.
The SOAP (Study of Open Access Publishing) project has run a large-scale
survey of the attitudes of researchers on, and the experiences with, open
access publishing. Around forty thousands answers were collected across
disciplines and around the world, showing an overwhelming support for the idea
of open access, while highlighting funding and (perceived) quality as the main
barriers to publishing in open access journals. This article serves as an
introduction to the survey and presents this and other highlights from a
preliminary analysis of the survey responses.
Using citation analysis, sets of documents can be compared as independent
samples; for example, in terms of average citation counts using potentially
different reference sets. From this perspective, the size of samples matters
only for the identification of significant differences and estimating margins
of error. Using the percentile rank approach, differences among citation
distributions can be studied non-parametrically and in a single scheme.
Comparison among the sets clarifies that the different sizes of samples affect
the weighing of the probabilities and therefore the rankings.
Devising an appropriate scheme that assigns the weights to share credits
among multiple authors of a paper is a challenging task. This challenge comes
from the fact that different types of conventions might be followed among
different research discipline or research groups. In this paper, we discuss
that for the purpose of evaluating the quality of research produced by authors,
one can resequence either authors or weights and can apply a weight assignment
policy which the evaluator deems fit for the particular research discipline or
research group.
Colombian scientific journals are poorly represented in international digital
libraries; however, through Google Scholar (GS) it is possible to determine
their use by the community. Between the years of 2003 and 2007 a classification
of 185 Colombian journals indexed in the Colombian National Bibliographical
Index (IBNP) was performed using the information provided by GS, basing
categorization on size indicators, indexation and citation. The indicators were
analyzed by grouping the journals in two general areas: sciences and social
sciences.
As the Distributed Collection Manager's work on building tools to support
users maintaining collections of changing web-based resources has progressed,
questions about the characteristics of people's collections of web pages have
arisen. Simultaneously, work in the areas of social bookmarking, social news,
and subscription-based technologies have been taking the existence, usage, and
utility of this data for granted with neither investigation into what people
are doing with their collections nor how they are trying to maintain them.
It is popular nowadays to bring techniques from bibliometrics and
scientometrics into the world of digital libraries to analyze the collaboration
patterns and explore mechanisms which underlie community development. In this
paper we use the DBLP data to investigate the author's scientific career and
provide an in-depth exploration of some of the computer science communities. We
compare them in terms of productivity, population stability and collaboration
trends.Besides we use these features to compare the sets of topranked
conferences with their lower ranked counterparts.
We analyze the citation distributions of all papers published in Physical
Review journals between 1985 and 2009. The average number of citations received
by papers published in a given year and in a given field is computed. Large
variations are found, showing that it is not fair to compare citation numbers
across fields and years. However, when a rescaling procedure by the average is
used, it is possible to compare impartially articles across years and fields.
We make the rescaling factors available, for use by the readers.
The networking ability of journals reflects their academic influence among
peer journals. This paper analyzes the cited and citing environments of the
journal--Advances in Atmospheric Sciences--using methods from social network
analysis. The journal has been actively participating in the international
journal environment, but one has a tendency to cite papers published in
international journals. Advances in Atmospheric Sciences is intensely
interrelated with international peer journals in terms of similar citing
pattern.
Solutions to the classic problems of dealing with heterogeneous data and
making entire collections interoperable while ensuring that any annotation,
which includes the recognition-and-reward system of scientific publishing, need
to fit into a seamless beginning to end to attract large numbers of end users.
The latest trend in Web applications encourages highly interactive Web sites
with rich user interfaces featuring content integrated from various sources
around the Web.
Integration of the scientific literature into a biomedical research
infrastructure requires the processing of the literature, identification of the
contained named entities (NEs) and concepts, and to represent the content in a
standardised way. The CALBC project partners (PPs) have produced a large-scale
annotated biomedical corpus with four different semantic groups through the
harmonisation of annotations from automatic text mining solutions (Silver
Standard Corpus, SSC).
A Knowledge Management System developed for supporting creation, capture,
storage and dissemination of information about Epilepsy and Epileptic Seizures
is presented. We present an Ontology on Epilepsy and a Web-based prototype that
together create the KMS.
Grasping the fruits of "emerging technologies" is an objective of many
government priority programs in a knowledge-based and globalizing economy. We
use the publication records (in the Science Citation Index) of two emerging
technologies to study the mechanisms of diffusion in the case of two innovation
trajectories: small interference RNA (siRNA) and nano-crystalline solar cells
(NCSC). Methods for analyzing and visualizing geographical and cognitive
diffusion are specified as indicators of different dynamics.
Can (scientific) knowledge be reliably preserved over the long term. We have
today very efficient and reliable methods to encode, store and retrieve data in
a storage medium that is fault tolerant against many types of failures. But
does this guarantee -- or does it even seem likely -- that all knowledge can be
preserved over tens of thousands of years and beyond? History shows that many
types of knowledge that were known before have been lost.
We continue investigation of the effect of position in announcements of newly
received articles, a single day artifact, with citations received over the
course of ensuing years. Earlier work [arXiv:0907.4740, arXiv:0805.0307]
focused on the "visibility" effect for positions near the beginnings of
announcements, and on the "self-promotion" effect associated to authors
intentionally aiming for these positions, with both found correlated to a later
enhanced citation rate.
In their article, entitled "Towards a new crown indicator: some theoretical
considerations," Waltman et al.
This paper deals with the semantic interpretation of information resources
(e.g., images, videos, 3D models). We present a case study of an approach based
on semantic and context dependent similarity applied to the industrial design.
Different application contexts are considered and modelled to browse a
repository of 3D digital objects according to different perspectives. The paper
briefly summarises the basic concepts behind the semantic similarity approach
and illustrates its application and results.
As part of its program of 'Excellence in Research for Australia' (ERA), the
Australian Research Council ranked journals into four categories (A*, A, B, C)
in preparation for their performance evaluation of Australian universities. The
ranking is important because it likely to have a major impact on publication
choices and research dissemination in Australia. The ranking is problematic
because it is evident that some disciplines have been treated very differently
than others.
We present animations based on the aggregated journal-journal citations of
Leonardo during the period 1974-2008. Leonardo is mainly cited by journals
outside the arts domain for cultural reasons, for example, in neuropsychology
and physics. Articles in Leonardo itself cite a large number of journals, but
with a focus on the arts. Animations at this level of aggregation enable us to
show the history of the journal from a network perspective.
Introduction. The purpose of this work is the evaluation of responsiveness
when remote users communicate with a human-readable knowledge base (KB).
Responsiveness [R(s)] is considered here as a measure of service quality.
Method. The preferred method is operational analysis, a variation of classical
stochastic theory, which allows for the study of user-system interaction with
minimal computational effort. Analysis. The analysis is based on well-known
performance metrics, such as service ability, elapsed time, and throughput:
from these metrics estimates of R(s) are derived analytically.
Algorithmic historiography was proposed by Eugene Garfield in collaboration
with Irving Sher in the 1960s, but further developed only recently into
HistCite^{TM} with Alexander Pudovkin. As in history writing, HistCite^{TM}
reconstructs by drawing intellectual lineages. In addition to cited references,
however, documents can be attributed a multitude of other variables such as
title words, keywords, journal names, author names, and even full texts.
This paper is a reply to the article "Scopus's Source Normalized Impact per
Paper (SNIP) versus a Journal Impact Factor based on Fractional Counting of
Citations", published by Loet Leydesdorff and Tobias Opthof (arXiv:1004.3580v2
[cs.DL]).
Formal mathematics has so far not taken full advantage of ideas from
collaborative tools such as wikis and distributed version control systems
(DVCS). We argue that the field could profit from such tools, serving both
newcomers and experts alike. We describe a preliminary system for such
collaborative development based on the Git DVCS. We focus, initially, on the
Mizar system and its library of formalized mathematics.
Building a repository of proof-checked mathematical knowledge is without any
doubt a lot of work, and besides the actual formalization process there also is
the task of maintaining the repository. Thus it seems obvious to keep a
repsoitory as small as possible, in particular each piece of mathematical
knowledge should be formalized only once. In this paper, however, we claim that
it might be reasonable or even necessary to duplicate knowledge in a
mathematical repository. We analyze different situations and reasons for doing
so and provide a number of examples supporting our thesis.
After two decades of repository development, some conclusions may be drawn as
to which type of repository and what kind of service best supports digital
scholarly communication, and thus the production of new knowledge. Four types
of publication repository may be distinguished, namely the subject-based
repository, research repository, national repository system and institutional
repository. Two important shifts in the role of repositories may be noted. With
regard to content, a well-defined and high quality corpus is essential.
Implementation of an editing process for Content MathML formulas in common
visual style is a real challenge for a software developer who does not really
want the user to have to understand the structure of Content MathML in order to
edit an expression, since it is expected that users are often not that
technically minded. In this paper, we demonstrate how this aim is achieved in
the context of the Formulator project and discuss features of this MathML
editor, which provides a user with a WYSIWYG editing style while authoring
MathML documents with Content or mixed markup.
The question of citation behavior has always intrigued scientists from
various disciplines. While general citation patterns have been widely studied
in the literature we develop the notion of citation projection graphs by
investigating the citations among the publications that a given paper cites.
We mark up a corpus of LaTeX lecture notes semantically and expose them as
Linked Data in XHTML+MathML+RDFa. Our application makes the resulting documents
interactively browsable for students. Our ontology helps to answer queries from
students and lecturers, and paves the path towards an integration of our corpus
with external sites.
We present an empirical comparison between two normalization mechanisms for
citation-based indicators of research performance. These mechanisms aim to
correct for the field and the year in which a publication was published. One
mechanism is applied in the current crown indicator of our institute. The other
mechanism is applied in the new crown indicator that our institute is planning
to adopt. We find that at high aggregation levels, such as at the level of
large institutes or at the level of countries, the differences between the two
mechanisms are very small.
The article "Caveats for the journal and field normalizations in the CWTS
(`Leiden') evaluations of research performance", published by Tobias Opthof and
Loet Leydesdorff (arXiv:1002.2769) deals with a subject as important as the
application of so called field normalized indicators of citation impact in the
assessment of research performance of individual researchers and research
groups. Field normalization aims to account for differences in citation
practices across scientific-scholarly subject fields.
SWiM is a semantic wiki for collaboratively building, editing and browsing
mathematical knowledge represented in the domain-specific structural semantic
markup language OMDoc. It motivates users to contribute to collections of
mathematical knowledge by instantly sharing the benefits of knowledge-powered
services with them. SWiM is currently being used for authoring content
dictionaries, i. e. collections of uniquely identified mathematical symbols,
and prepared for managing a large-scale proof formalisation effort.
At this http URL, the OpenMath 2 and 3 Content Dictionaries are
accessible via a semantic wiki interface, powered by the SWiM system. We
shortly introduce the inner workings of the system, then describe how to use
it, and conclude with first experiences gained from OpenMath society members
working with the system and an outlook to further development plans.
After two decades of repository development, some conclusions may be drawn as
to which type of repository and what kind of service best supports digital
scholarly communication, and thus the production of new knowledge. Four types
of publication repository may be distinguished, namely the subject-based
repository, research repository, national repository system and institutional
repository. Two important shifts in the role of repositories may be noted. With
regard to content, a well-defined and high quality corpus is essential.
The basic classification techniques for organizing information are thesauri,
taxonomy and faceted classification. Topic map is relatively a new entrant to
this information space. Topic map standard describes how complex relationships
between abstract concepts and real world resources can be represented using XML
syntax.
Dereferencing a URI returns a representation of the current state of the
resource identified by that URI. But, on the Web representations of prior
states of a resource are also available, for example, as resource versions in
Content Management Systems or archival resources in Web Archives such as the
Internet Archive. This paper introduces a resource versioning mechanism that is
fully based on HTTP and uses datetime as a global version indicator.
As Digital Libraries (DL) become more aligned with the web architecture,
their functional components need to be fundamentally rethought in terms of URIs
and HTTP. Annotation, a core scholarly activity enabled by many DL solutions,
exhibits a clearly unacceptable characteristic when existing models are applied
to the web: due to the representations of web resources changing over time, an
annotation made about a web resource today may no longer be relevant to the
representation that is served from that same resource tomorrow.
VOS is a new mapping technique that can serve as an alternative to the
well-known technique of multidimensional scaling. We present an extensive
comparison between the use of multidimensional scaling and the use of VOS for
constructing bibliometric maps. In our theoretical analysis, we show the
mathematical relation between the two techniques.
We present a theoretical and empirical analysis of a number of bibliometric
indicators of journal performance. We focus on three indicators in particular,
namely the Eigenfactor indicator, the audience factor, and the influence weight
indicator. Our main finding is that the last two indicators can be regarded as
a kind of special cases of the first indicator. We also find that the three
indicators can be nicely characterized in terms of two properties.
The crown indicator is a well-known bibliometric indicator of research
performance developed by our institute. The indicator aims to normalize
citation counts for differences among fields. We critically examine the
theoretical basis of the normalization mechanism applied in the crown
indicator. We also make a comparison with an alternative normalization
mechanism. The alternative mechanism turns out to have more satisfactory
properties than the mechanism applied in the crown indicator. In particular,
the alternative mechanism has a so-called consistency property.
We reply to the criticism of Opthof and Leydesdorff [arXiv:1002.2769] on the
way in which our institute applies journal and field normalizations to citation
counts. We point out why we believe most of the criticism is unjustified, but
we also indicate where we think Opthof and Leydesdorff raise a valid point.
The Actor Network represents heterogeneous entities as actants (Callon et
al., 1983; 1986). Although computer programs for the visualization of social
networks increasingly allow us to represent heterogeneity in a network using
different shapes and colors for the visualization, hitherto this possibility
has scarcely been exploited (Mogoutov et al., 2008). In this contribution to
the Festschrift, I study the question of what heterogeneity can add
specifically to the visualization of a network.
A theory of citations should not consider cited and/or citing agents as its
sole subject of study. One is able to study also the dynamics in the networks
of communications. While communicating agents (e.g., authors, laboratories,
journals) can be made comparable in terms of their publication and citation
counts, one would expect the communication networks not to be homogeneous. The
latent structures of the network indicate different codifications that span a
space of possible 'translations'. The various subdynamics can be hypothesized
from an evolutionary perspective.
Social Network Analysis (SNA) of organizations can attract great interest
from government agencies and scientists for its ability to boost translational
research and accelerate the process of converting research to care. For SNA of
a particular disease area, we need to identify the key research groups in that
area by mining the affiliation information from PubMed. This not only involves
recognizing the organization names in the affiliation string, but also
resolving ambiguities to identify the article with a unique organization.
Automatically extracting organization names from the affiliation sentences of
articles related to biomedicine is of great interest to the pharmaceutical
marketing industry, health care funding agencies and public health officials.
It will also be useful for other scientists in normalizing author names,
automatically creating citations, indexing articles and identifying potential
resources or collaborators.
The idea of a World digital mathematics library (DML) has been around since
the turn of the 21th century. We feel that it is time to make it a reality,
starting in a modest way from successful bricks that have already been built,
but with an ambitious goal in mind. After a brief historical overview of
publishing mathematics, an estimate of the size and a characterisation of the
bulk of documents to be included in the DML, we turn to proposing a model for a
Reference Digital Mathematics Library--a network of institutions where the
digital documents would be physically archived.
Current science and technology has produced more and more publically
accessible scientific data. However, little is known about how the open data
trend impacts a scientific community, specifically in terms of its
collaboration behaviors. This paper aims to enhance our understanding of the
dynamics of scientific collaboration in the open data eScience environment via
a case study of co-author networks of an active and highly cited open data
project, called Sloan Digital Sky Survey.
Inspired by interdisciplinary work touching biology and microtribology, the
authors propose a new, dynamic way of publishing research results, the
establishment of a tree of knowledge and the localisation of scientific
articles on this tree. 'Technomimetics' is proposed as a new method of
knowledge management in science and technology: it shall help find and organise
information in an era of over-information.
We present a novel approach to visually locate bodies of research within the
sciences, both at each moment of time and dynamically. This article describes
how this approach fits with other efforts to locally and globally map
scientific outputs. We then show how these science overlay maps help benchmark,
explore collaborations, and track temporal changes, using examples of
universities, corporations, funding agencies, and research topics.
Computer science is a relatively young discipline combining science,
engineering, and mathematics. The main flavors of computer science research
involve the theoretical development of conceptual models for the different
aspects of computing and the more applicative building of software artifacts
and assessment of their properties. In the computer science publication
culture, conferences are an important vehicle to quickly move ideas, and
journals often publish deeper versions of papers already presented at
conferences.
This paper proposes an indicator of journals' scientific prestige, the SJR
indicator, for ranking scholarly journals based on citation weighting schemes
and eigenvector centrality to be used in complex and heterogeneous citation
networks such Scopus. Its computation methodology is described and the results
after implementing the indicator over Scopus 2007 dataset are compared to an
ad-hoc Journal Impact Factor both generally and inside specific scientific
areas.
A review of the empirical literature on access to scholarly information. This
review focuses on surveys of authors, article download and citation analysis.
The possibilities of using the Arts & Humanities Citation Index (A&HCI) for
journal mapping have not been sufficiently recognized because of the absence of
a Journal Citations Report (JCR) for this database.
In this paper, we describe our decade-long experience of building and
operating one of the most active Institutional Repository in the world:
www.saber.ula.ve <this http URL> (University of the Andes,
Merida-Venezuela). In order to share our experience with other institutions, we
firstly explain the steps we followed to preserve and disseminate the
scientific production of the University of Los Andes' researchers.
The profusion of online digital images presents new challenges for image
indexing. Images have always been problematic to describe and catalogue due to
lack of inherent textual data and ambiguity of meaning. An alternative to
time-consuming professionally-applied metadata has been sought in the form of
tags, simple keywords that form a flat structure known as distributed
classification, or more popularly as a folksonomy.
Seismology has several features that suggest it is a highly internationalized
field: the subject matter is global, the tools used to analyse seismic waves
are dependent upon information technologies, and governments are interested in
funding cooperative research. We explore whether an emerging field like
seismology has a more internationalised structure than the older, related field
of geophysics. Using aggregated journal-journal citations, we first show that,
within the citing environment, seismology emerged from within geophysics as its
own field in the 1990s.
When using scientific literature to model scholarly discourse, a research
specialty can be operationalized as an evolving set of related documents. Each
publication can be expected to contribute to the further development of the
specialty at the research front.
The aggregated citation relations among journals included in the Science
Citation Index provide us with a huge matrix which can be analyzed in various
ways. Using principal component analysis or factor analysis, the factor scores
can be used as indicators of the position of the cited journals in the citing
dimensions of the database. Unrotated factor scores are exact, and the
extraction of principal components can be made stepwise since the principal
components are independent. Rotation may be needed for the designation, but in
the rotated solution a model is assumed.
From its inception, a large part of the motivation for Cognitive Science has
been the need for an interdisciplinary journal for the study of minds and
intelligent systems. One threat to the interdisciplinarity of Cognitive
Science, both the field and journal, is that it may become, or already be, too
dominated by psychologists. In 2005, psychology was a keyword for 51% of
submissions, followed distantly by linguistics (17%), artificial intelligence
(13%), neuroscience (10%), computer science (9%), and philosophy (8%).
Aggregated journal-journal citation networks based on the Journal Citation
Reports 2004 of the Science Citation Index (5968 journals) and the Social
Science Citation Index (1712 journals) are made accessible from the perspective
of any of these journals. The user is thus able to analyze the citation
environment in terms of links and graphs. Furthermore, the local impact of a
journal is defined as its share of the total citations in the specific
journal's citation environments; the vertical size of the nodes is varied
proportionally to this citation impact.
Recently, aggregated journal-journal citation networks were made accessible
from the perspective of each journal included in the Science Citation Index see
(this http URL). The local matrices can be used to inspect the
relevant citation environment of a journal using statistical analysis and
visualization techniques from social network analysis. The inspection gives an
answer to the question what the local impact of this and other journals in the
environment is. In this study the citation environment of Angewandte Chemie was
analysed.
The citation impact of Environment and Planning B can be visualized using its
citation relations with journals in its environment as the links of a network.
The size of the nodes is varied in correspondence to the relative citation
impact in this environment. Additionally, one can correct for the effect of
within-journal "self"-citations. The network can be partitioned and clustered
using algorithms from social network analysis.
Can change in citation patterns among journals be used as an indicator of
structural change in the organization of the sciences? Aggregated
journal-journal citations for 1999 are compared with similar data in the
Journal Citation Reports 1998 of the Science Citation Index. In addition to
indicating local change, probabilistic entropy measures enable us to analyze
changes in distributions at different levels of aggregation. The results of
various statistics are discussed and compared by elaborating the
journal-journal mappings.
Based on the citation data of journals covered by the China Scientific and
Technical Papers and Citations Database (CSTPCD), we obtained aggregated
journal-journal citation environments by applying routines developed
specifically for this purpose. Local citation impact of journals is defined as
the share of the total citations in a local citation environment, which is
expressed as a ratio and can be visualized by the size of the nodes.
The journal structure in the China Scientific and Technical Papers and
Citations Database (CSTPCD) is analysed from three perspectives: the database
level, the specialty level and the institutional level (i.e., university
journals versus journals issued by the Chinese Academy of Sciences). The
results are compared with those for (Chinese) journals included in the Science
Citation Index. The frequency of journal-journal citation relations in the
CSTPCD is an order of magnitude lower than in the SCI.
This paper explores a new indicator of journal citation impact, denoted as
source normalized impact per paper (SNIP). It measures a journal's contextual
citation impact, taking into account characteristics of its properly defined
subject field, especially the frequency at which authors cite other papers in
their reference lists, the rapidity of maturing of citation impact, and the
extent to which a database used for the assessment covers the field's
literature.
Purpose: to provide a view and analysis of the immediate field of journals
which surround a number of key heterodox economics journals.
Design/methodology/approach: Using citation data from the Science and Social
Science Citation Index, the individual and collective networks of a number of
journals in this field are analyzed. Findings: The size and shape of the
citation networks of journals can differ substantially, even if in a broadly
similar category.
The Eigenfactor Metrics provide an alternative way of evaluating scholarly
journals based on an iterative ranking procedure analogous to Google's PageRank
algorithm. These metrics have recently been adopted by Thomson-Reuters and are
listed alongside the Impact Factor in the Journal Citation Reports. But do
these metrics differ sufficiently so as to be a useful addition to the
bibliometric toolbox? Davis (2008) has argued otherwise, based on his finding
of a 0.95 correlation coefficient between Eigenfactor score and total citations
for a sample of journals in the field of medicine.
This research analyzes a "who cites whom" matrix in terms of aggregated,
journal-journal citations to determine the location of communication studies on
the academic spectrum. Using the Journal of Communication as the seed journal,
the 2006 data in the Journal Citation Reports are used to map communication
studies. The results show that social and experimental psychology journals are
the most frequently used sources of information in this field.
The current communication presents a simple exercise with the aim of solving
a singular problem: the retrieval of extremely large amounts of items in the
Web of Science interface. As it is known, Web of Science interface allows a
user to obtain at most 100,000 items from a single query. But what about
queries that achieve a result of more than 100,000 items? The exercise
developed one possible way to achieve this objective. The case study is the
retrieval of the entire scientific production from the United States in a
specific year.
In the objective of building intelligent searching systems for Elibraries or
online bookstores, we have proposed a searching system model based on a
Vietnamese language query processing component. Such document searching systems
based on this model can allow users to use Vietnamese queries that represent
content information as input, instead of entering keywords for searching in
specific fields in database.
In the process of scientific research, many information objects are
generated, all of which may remain valuable indefinitely. However, artifacts
such as instrument data and associated calibration information may have little
value in isolation; their meaning is derived from their relationships to each
other. Individual artifacts are best represented as components of a life cycle
that is specific to a scientific research domain or project.
Preserving access to file content requires preserving not just bits but also
meaningful logical structures. The ongoing development of the Data Format
Description Language (DFDL) is a completely general standard that addresses
this need. The Defuddle parser is a generic parser that can use DFDL-style
format descriptions to extract logical structures from ASCII or binary files
written in those formats. DFDL and Defuddle provide a preservation capability
that has minimal format-specific software and cleanly separates issues related
to bits, formats, and logical content.
This paper provides an overview (in French) of the European PEER project,
focusing on its origins, the actual objectives and the technical deployment.
Designing and implementing comprehensive IT-based support environments for KM
in organizations is fraught with many problems. Solving them requires intimate
knowledge about the information usage in knowledge works and the scopes of
technology intervention. In this paper, the Task-oriented Organizational
Knowledge Management or TOKM, a design theory for building integrated IT
platforms for supporting organizational KM, is proposed. TOKM brings together
two apparently mutually exclusive practices of building KM systems, the
task-based approach and the generic or universalistic approach.
One of the most important goals of information management (IM) is supporting
the knowledge workers in performing their works. In this paper we examine
issues of relevance, linkage and provenance of information, as accessed and
used by the knowledge workers. These are usually not adequately addressed in
most of the IT based solutions for IM. Here we propose a non-conventional
approach for building information systems for supporting the knowledge workers
which addresses these issues.
The NASA Astrophysics Data System (ADS), along with astronomy's journals and
data centers (a collaboration dubbed URANIA), has developed a distributed
on-line digital library which has become the dominant means by which
astronomers search, access and read their technical literature. Digital
libraries such as the NASA Astrophysics Data System permit the easy
accumulation of a new type of bibliometric measure, the number of electronic
accesses (``reads'') of individual articles. We explore various aspects of this
new measure.
By combining data from the text, citation, and reference databases with data
from the ADS readership logs we have been able to create Second Order
Bibliometric Operators, a customizable class of collaborative filters which
permits substantially improved accuracy in literature queries.
I outline the involvement of the Los Alamos e-print archive (arXiv) within the Open Archives Initiative (OAI) and describe the implementation of the data provider side of the OAI protocol v1.0. I highlight the ways in which we map the existing structure of arXiv onto elements of the protocol.
Work in the Open Archives Initiative - Object Reuse and Exchange (OAI-ORE)
focuses on an important aspect of infrastructure for eScience: the
specification of the data model and a suite of implementation standards to
identify and describe compound objects. These are objects that aggregate
multiple sources of content including text, images, data, visualization tools,
and the like. These aggregations are an essential product of eScience, and will
become increasingly common in the age of data-driven scholarship. The OAI-ORE