Conservation GIS Help   

Support . Data Models . Donors . Tools . Training . Starter Kit . Internet tools . Data . TechTips . Search

GeoData Models for Conservation & Ecology


-What Are Data Models:
ESRI is undertaking a new initiative to provide its user base with essential data models that follow a framework that is easily implemented by end users. These data models will be developed for a number of industries, disciplines, and categories of geographic phenomena, providing a template composed of standard data classifications and feature data models.

-ESRI GeoData Models: The geodatabase object-modeling capabilities included in ArcInfo 8 provide an opportunity to develop and share standard data models and templates in a variety of fields. These standard models will promote the sharing and exchange of data and designs for all software users, including ArcView.

-Modeling Our World: (by ESRI Press) is the comprehensive guide and reference to GIS data modeling in general, and to the geodatabase model in particular.

-ArcGIS Water Facilities Model: The first data model from the ArcGIS Data Model initiative is called ArcFM Water, focused on the water/wastewater industry.

ArcGIS Biodiversity Model: A data model for the conservation of biodiversity under the standards and practices initially developed by The Nature Conservancy is under development in cooperation with the Association for Biodiversity Information (ABI), a global network of Natural Heritage Programs and Databases.

ArcGIS Forestry Data Model: the ESRI Forestry Spatial Interest Group (FSIG), headed by Potlatch Corporation, has published it's first white paper on data modelling standards.

ArcGIS Hydrology Model: This effort is being led by Dr. David Maidment of the University of Texas at Austin. Their first publication is Hydrologic and Hydraulic Modeling Support with Geographic Information Systems (ESRI Press)


Data Models in Conservation and GIS: The ESRI Conservation Program is beginning work on a general conservation data model which will describe very basic objects and relationships for Conservation GIS in the widest sense, including: Taxonomy, Vegetation, Physignomy, Plot data, Habitat Relations, Energy cycles, Matter Cycles, Dispersal and Migration dynamics, Evolutionary Dynamics, Autecology, Synecology, Biogeography, Source/Authority tracking and bibliography, Conventions, Laws and Mandates, Organizational memberships and demographics, GIS data libraries.

Partners and contributors from a wide variety of fields are being solicited for input and suggestions, and the model/CD is set for publication in 2001.

A skeleton overview of one of the antecedents to this type of wide-ranging conservation model can be seen below, the integrated data modelling tutorial developed by Charles Convis for the Conservation Data Manager Project in 1995. Note in particular the example schema diagram at the end, which contains many of the objects and relationships which will appear in the new Conservation GeoData Model.


Conservation Data Manager Design Concept Tutorial
(Charles Convis, ESRI Conservation Program, 1995)


WITHOUT DESIGN

- As databases grow they become less useful

- Relationships between data are undefined

- Classification systems may not match

- Databases difficult to add on to

- Application software difficult to modify

- User requirements don't guide the process

- Data products very likely will not be used

- Software and applications will likely not be used

- Minimal and decreasing benefit to science

The requirements of conservation and science dictate that as new knowledge is gained it should impact upon and enrich many other areas of study. New data about plant ecology should enhance our understanding of it's evolution, taxonomy, and threats. In the world of databases, however, this promise has not been fulfilled. In general, scientific databases have suffered from too much variation in standards and structure to be readily combined with other data. The typical scientific database is used within a very narrow realm pertaining to it's specific content, or not at all. Attempts to combine that data with data from other disciplines, or to fit it into an integrated data collection or management scheme, often fail. The size, number and variety of such databases continues to grow, meaning that even in the midst of increasing amounts of data, less and less of it is actually useful for conservation.
The variety of custom software tools for scientific data is also increasing. Generally these tools are welded to a specific data structure and designed as black boxes, with little opportunity for local modification. As a result, they suffer from the same weaknesses as databases in terms of only being useful within a very narrow discipline by a very narrow user audience.
Design is a structured process of planning for both databases and software which corrects most of these problems:


WITH DESIGN

- Databases part of a whole concept, as they grow they enrich that concept

- Relationships between data are well-defined

- Classification systems relate in a defined and reproducible manner

- Databases straightforward add on to

- Application software modification defined and possible even at novice levels

- User requirements known and guide the process

- Data products very likely will be used

- Software and applications will likely be used

- Maximum and increasing benefit to science

The primary method used to ensure the success of a new GIS effort is design. The design effort is a structured series of activities carried out before and during system development, whose primary goal is to help define the purpose of the new GIS effort, and how it is likely to fit in with the existing flows of information and communications. It results in many useful products: a flow chart graphically showing how the movement of information and tasks through the existing and proposed system can meet stated goals, an implementation plan presenting the specific steps needed to build the new system in an efficient and cost-effective manner.

In the design process, classification differences are explicity addressed and managed. Standards in classification, scale and basemap layers are chosen where appropriate so that all data components have a common reference point allowing them to be linked together. Existing databases and data sources are inventoried along with user needs so that a clear picture of current information status can be used as a starting point. Data tables are analyzed for shared or common attributes, and normalized so that data sharing is possible at all levels and future modifications can be done at minimal cost. Software design is also normalized so that the different pieces of any application can be developed and interchanged independently.

Data products and applications arising from a well-designed effort can therefore fit readily into the existing user environments, so that as new data is gathered and distributed it can provide immediate utility and increasing enhancement to user science.

 

IF DESIGN IS SO GREAT WHY DOESN'T EVERYONE DO IT?

- Relational Designs may be complex and challenging.

- They take a lot more time up front when people are least likely to want to spend it.

- Cost savings and increased utility in the future are difficult to sell today.

- Simple database and application tools suffice for undesigned and stand-alone flat files.

- Powerful database and application tools are needed to support all the requirements of a designed system.

- Without the tools to build databases and applications on those designs then the effort is futile anyway.

The tools and methods of project design have been known since they were first applied to aircraft manufacture during World War II. They have been found to be especially powerful in software and database design and are widely used in big-budget projects where they help save millions in development and maintenance costs. The reason they are not more widely used is mainly cost, both in dollars and time. Most smaller projects are anxious to see software and data products as soon as possible, and have little time patience for a lengthy initial design effort. Most smaller projects therefore produce software and databases which have limited to zero life outside the scope or timeline of the project itself. Most small projects have little awareness or concern for the "cost of ownership" of a digital effort, as opposed to the "cost of acquisition". The rule of thumb is that you will spend 100 to 1000 times the cost of acquisition of a software/data product in terms of ongoing support, modification and training just to keep it running over a few years. Design efforts prior to acquistion will cut this figure by orders of magnitude.
Another common problem is that database management tools on the PC have been poor, limiting the kinds of applications that could be developed. Many of them were based on flat-file managers and therefore had very limited capability to handle the more complex relationships typical of an integrated database design.

HOW DO WE GET THERE?

- Traditional GIS design & development process:

Statement of goals

Data Inventory

User Needs Analysis

Conceptual Database design: define, expand, consolidate, review

Physical Database design

Conceptual Application design

Physical Application design

Training Design

Prototype cycle: research, standards, prototype

Implementation of Software, Database, Training

Data Automation

Distrubtion/publishing

Statement of goals


Defining the purpose of any new information management system must begin with a careful examination of the purpose and parts of the current system, whether manual or automated, looking at what works, what doesn't, and what must change. The first part of this questionnaire looks at your current program independent of any future GIS capacity but using the same analytical approach that you will later use to design your GIS program. A GIS is said to consist of 5 basic elements: People, Data, Procedures to work on the data, Hardware, and Software. People often consider only the last two, so for this exercise we will ignore them. These first three elements can be examined in more detail by breaking them down into the traditional who, what, when, where and why of investigation:

Why are you here: program goals
Who do you serve: your audience
What do you provide: your products
Where does your data come from: data sources
How do you do it: your tasks.
Who helps you: your support
What constrains you: your limitations

We will look at each of these in detail to describe your current system of information management and communication, then look at them again to define what you expect from a new GIS program

.

Conceptual Database design: define, expand, consolidate, review

The conceptual design itself will serve as a major guiding document for all database development and software application. It is a living document which will be reviewed continually as new data sources arise, as staff skills expand, and as the GIS evolves. This is a 5-step process:

Define: Define features and entities at a conceptual level

Expand: List attributes and characteristics of each entity

Consolidate: Look for commonalties in the attributes or classifications of the entities

Implement: Lay out the conceptual tables for each entity: DATA DICTIONARY

Review: As GIS data and software development proceed


DEFINE


This is the step where we define the features and entities we will be working with at a conceptual level. One way this can be done is by conducting user interviews and listing for each user the main types of features or entities that need to be handled. Another way is to review standard reports, texts or maps looking for chapter & section headings and legends. Features usually refers to things with a spatial component, whereas entities can refer to anything.
Example: Define General Categories of things we would like to keep track of:

Ecological data

Park Species Lists, Flora
Fauna
Existing Classification Systems
Field Plots
Data Sources
(maps, imagery, books)
Authors/Experts

Management data

Parks and offices
Contact persons
Logistical concerns
Progress of the mapping effort
Existing database themes
Quad sheets
photo missions

EXPAND


This is where we list the attributes and characteristics of each feature & entity defined above. For each attribute, we also want to ask if there is only one for each entity or if an entity can have many. We also want to indicate which attributes are absolutely required in order for the entity to exist, and we generally want to indicate at least one attribute for this role. If there aren't any completely unique attributes (like with transactions) we'll just make up an internal sequential number.

Example: List the attributes of each entity/feature

Data Source:
- title, only one REQUIRED
- date, only one main date
- authors, can be many, some co-authors
- editor, usually only one
- storage location

Contact Person:
- name: REQUIRED
- address & contact information
- list of parks associated with
- list of skills
- list of publications

 

CONSOLIDATE


This is where we go through all of the lists and definitions and look for commonalties. Where two entities list a similar attribute we ask what the differences are, how important are they or what the implications would be of making them the same. Where an entity lists as an attribute that appears to be another entity we ask what the differences are and if the attribute can actually be combined with that entity. What we hope to do is winnow out a final condensed list of entities and features and list for each only those attributes common to every place they appear, then list separately the additional attributes needed to define the characteristics of each relationship but which are not attributes of either side by itself.

Example

Data Source:
- title, only one REQUIRED
- date, only one main date
- authors, can be many, some co-authors = PERSON
- editor, usually only one = PERSON
- storage location

PERSON (old Contact Person)
- name: REQUIRED
- address & contact information
- list of parks associated with = PARK (Relationship attribute = role or title at that park)
- list of skills
- list of publications = DATA SOURCE (Relationship attribute = authorship role in that source)


Data Dictionary: Sample Page

Entity: RED FLAG AREAS: Red-flag areas of special concern in or around a park which will be considered in mapping the park and its environs
File: Redarea.dbf indexes: rnum, rname
Item: form definition
+-----------------+-----+---------------------------------------
rareanum n4 red flag area database numeric ID, primary key
rname c30 name of the site or area
i.e. "Johnson farm", "Muir Wilderness"
parkcode c4 Park ID code where area occurs, foreign key
i.e. "YOSE"
rtype c10 type of red-flag area:
i.e. Inholding, Disturbance (fire, avalanche), Scientific study area, Access-limited area, Vegetation Management area, Area of Interest, etc.
rlocation c30 location description and directions
ie "Nelson Valley, Nelson Quad, 15m past route23 then 4 mi north on trail"
quadnum n4 topo quad number
rnotes m additional notes on how to handle region.
i.e. "Vehicle access prohibited", "Foot access allowed only with permission of landowner contact)
contactnum n4 contact ID code for info on the area, foreign key

These are individual areas of special concern within or near a park, including the entire area of interest for mapping in the park (this should correspond to the actual area mapped in each park (kvg####.pat) but may not due to cost or logistics). These are any areas which require special logistical considerations for vegetation study and mapping, for ecological, political or whatever reasons. Each record refers to one physical area within a park with a constant type of condition. This may include several contiguous or closely-located polygons as long as they can be treated as a unit for the purposes of permitting, access, study or mapping.
Attributes include those which describe the type of management concern and provide some background notes on how to handle each area.

RELATIONSHIPS
Parks
This is a 1-many relationship to the parks master file (see under parks)
Red-flag areas coverage:
This is a 1-1 relationship to the GIS coverage (red####.pat) for each park recording the actual polygons delineating each special area.
IMPLEMENT

This is where we go through the final list of entities and attributes and lay out conceptual tables for each entity, listing the core attributes that will be kept for that entity and indicating which ones are required and which ones serve to distinguish one record from the next. Since we are relying upon the relational database model, we can also create a table for every relationship defined, which serves to link two entities together. It must therefore contain the primary key attribute from each entity, plus any of the attributes defined as describing that relationship. An advanced topic worth noting at this point is the issue of scale. The entities defined have an implicit scale, both in spatial and thematic terms, over which they are valid. It is worth being aware of this and being aware when you are defining relationships which are more or less within the same scale (such as quads and rivers in space or river processes and sediment load in theme) versus those which are across scales (such as quads and global climate in space or sediment load and continental drift in theme)

Example: Define feature/entity tables and relationship tables

Data Source Table
- title, only one REQUIRED
- date, only one main date
- storage location

PERSON (old Contact Person)
- name: REQUIRED
- address & contact information
- list of skills

PERSON-DATA RELATIONSHIP
- person name
- data source title
- authorship role in that source i.e. author, co-author, editor...

 

EXAMPLE SCHEMA