Data migration

Within GNUmed

Database and table modifications

Also consider the threads "Best practices for migrating a development database" on psql-general:

Also see this PostgreSQL TidBits issue:

Into GNUmed

One GNUmedder is looking at migrating legacy data residing in FoxBASE .dbf format via dbf2mysql (though it does not handle memo fields) into mysql and from there into postgres.

Below, a synthesis of data migration issues from MS Access, discussed in the gnumed-devel list ~ Jan 25, 2006 within the thread "Catching errors in an app":

It just took me about 2.5 hours yesterday to import my 
clinical demographic database into postgres. Brought home to me once again 
how critical data input validation is in the first place. Secretaries manage 
to f*** up [lots of things], inadvertently of course [not their fault].

> Like Karsten, Ian believes the backend is the best/easiest place to do this.
> GUI-layer validation is nice as user functionality (not letting users type
> chars in the postcode fields, for example) but in terms of data validity
> it's not the main game. GNUmed's current problem is propagating these
> errors back to be user in a nice way is harder than it should be.  

There were probably many dozens of field errors within the 6,500 patients
attached to the practice, things like states in the postcode field, apostrophies
in states field, and worst of all, and I'd love a quick script to fix this if you have 
time, carriage returns in a number of different fields.

This affected the COPY function in postgres quite horribly in that the import 
failed when the field before it had a hidden carriage return(s). It took me a 
while to figure it out, but I"m not smart enough to know how to parse the 
text field to automatically remove them , or if postgres has some mechanism.

I don't know what the CR character is (or if it is a combination in Dos 
files), but I'd love a few line python routine which scans through the entire
file and replaces the CR with a blank space - do in seconds what I took 
manually (via going back to the access db table every time the record 
crashed).

The text data exported from access has pipe delimited format, ie
field|field1|||||||field2||||| etc etc some of which can be null.

> This is actually quite tough, because when is newline a "real" newline
> or a false one inside a field.

> You would need to count the pipes in each line and concatenate lines
> with less than the proper number.

so something like:

l = list (file ("data").readlines ())
l.append ('')
n = 0
while n < len (n) and n[i]:
    if len (n[i].split('|')) < 20: # or whatever the number of fields is
       n[i] += n[i+1]
       del n[i+1]
    else:
       n += 1

f= file ("data", "w")
for i in l:
   f.write (i+'\n')
f.close ()

This also raises the question of allowing multi-line text inputs in say the 
street field I guess.

> No, street names are street names - one line.
> There's an "addendum" field in the address table for all the extra address
> stuff, IMHO this (and only this) field should be able to have newlines.

Topic revision: 16 Sep 2006, JamesBusser
 
Download.png
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback
Powered by Olark