Kubke Lab:Research/ABR/Notebook/2013/11/06: Difference between revisions
(Autocreate 2013/11/06 Entry for Kubke_Lab:Research/ABR/Notebook) |
|||
Line 13: | Line 13: | ||
=Personal Entries= | =Personal Entries= | ||
==Fabiana== | ==Fabiana== | ||
* | *From file 2013-11-06-MFK.Rmd in Sandbox | ||
===Trying to organise file management=== | |||
Need to: | |||
* open the folder, grab the list of files | |||
* separate log files from text files | |||
* put them in a data.frame with one column having txt files and another having log files | |||
* make sure that the names of txt and log files actually match, and put NA where one of the file pairs is missing | |||
<html> | |||
<body> | |||
<h1>Trying to organise file management</h1> | |||
<p>1) Change directory to case # | |||
2) get drectory list | |||
3) subset files with log onto one column | |||
3) subset files with txt onto a second column</p> | |||
<p>Structure of files is different for different case numbers:</p> | |||
<p>Owl189 -> 189###.LOG, 189###.TXT (99 objects: 98 files + WS_FTP.log)<br> | |||
Owl222 -> 222###.LOG, 222###.TXT (31 objects: 30 files + WS_FTP.log)<br> | |||
Owl224 -> 224###.LOG, 224###.TXT (39 objects: 38 files + WS_FTP.log)<br> | |||
Owl229 -> 229###.LOG, 229###.TXT (34 objects: 33 files + WS_FTP.log) <br> | |||
Owl230 -> 230###.LOG, 230###.txt (333 objects, ????)<br> | |||
Owl233 -> 233###.ABR.log, 233###,ABR.txt(311 objects)<br> | |||
Owl335 -> 335###.ABR.log, 335###.ABR.txt (172 objects)<br> | |||
Owl336 -> 336###.abr.log, 336###.abr.txt(20 objects)<br> | |||
Owl416 -> 416###.abr.log, 416###.abr.txt(358 objects)<br> | |||
Owl419 -> 419###.abr.log, 419###.abr.txt (396 objects)<br></p> | |||
<p>Missing txt files in <br> | |||
Owl 189 (49 log, 49 txt)<br> | |||
Owl 222 (15 log, 15 txt, one txt too small size)<br> | |||
Owl 224 (19 log, 19 txt, one txt too small size)<br> | |||
Owl 229 (18 log, 15 txt)<br> | |||
Owl 230 (several 0 KB files and several small txt files) Folder 230(check has 206 files?)<br> | |||
Owl 233 (several 0KB files and small txt files) (folder 233(P)only has 64 objects)<br> | |||
Owl 335 (several 0KB files and small txt files)<br> | |||
Owll 336 (10 log files, 10 txt)<br> | |||
Owl 416 (several empty files)<br> | |||
Owll 419 (several empty files)<br></p> | |||
<p>testing with Owl222</p> | |||
<pre><code class="r"># basedir<- getwd() enter case to analyse: newdir <- readline('enter case | |||
# number: ') create dir name as basedir\Analysis\datafiles\case: basedir | |||
# <- getwd() casedir <- paste(basedir, newdir, sep = '/') setwd(casedir) | |||
</code></pre> | |||
<p>or</p> | |||
<pre><code class="r"># dir.create(file.path(basedir, casedir), showWarnings = FALSE) | |||
# setwd(file.path(basedir, casedir)) | |||
</code></pre> | |||
<p>1) Need to get files list<br> | |||
2) Need to separate files as [whatever].log in one column and [whatever].log in another. <br> | |||
3) Somehow I need to know if whatever on the same line do not match. </p> | |||
<p>Back to testing on folder Owl222<br> | |||
Owl222 -> 222###.LOG, 222###.TXT (31 objects: 30 files + WS_FTP.log)</p> | |||
<pre><code class="r">files <- dir() | |||
head(files) | |||
</code></pre> | |||
<pre><code>## [1] "189L0A.ABR" "2013-10-27-MFK.Rmd" "2013-10-27.html" | |||
## [4] "2013-10-27.R" "2013-10-27.txt" "2013-10-28-MFK.html" | |||
</code></pre> | |||
<pre><code class="r"># need to separate the txt from the logfiles | |||
log <- regexpr("(.*)[L|l][O|o][G|g]", files) | |||
logfiles <- regmatches(files, log) | |||
length(logfiles) | |||
</code></pre> | |||
<pre><code>## [1] 2 | |||
</code></pre> | |||
<pre><code class="r">print(logfiles) | |||
</code></pre> | |||
<pre><code>## [1] "419L76.ABR.log" "texput.log" | |||
</code></pre> | |||
<pre><code class="r"> | |||
txt <- regexpr("(.*)[T|t][X|x][T|t]", files) | |||
txtfiles <- regmatches(files, txt) | |||
length(txtfiles) | |||
</code></pre> | |||
<pre><code>## [1] 5 | |||
</code></pre> | |||
<pre><code class="r">print(txtfiles) | |||
</code></pre> | |||
<pre><code>## [1] "2013-10-27.txt" "2013-11-05-b-MFK.txt" "233L0B.ABR.txt" | |||
## [4] "419L76.ABR.txt" "mydata_new.txt" | |||
</code></pre> | |||
<p>Now need to put those into a single data frame, but make sure that the file names are matched for log and txt. So I am trying to compare the first 6 characters for each</p> | |||
<p>I can assume that if I do not have a txt file, it is irrelevant whether I have a log file or not - so can step through the txtfiles line by line and look for the match on the logfile and then dump that on a dataframe where column 1 is txt files and column 2 is logfiles and if a log file is missing, then I can put a NaN</p> | |||
<pre><code class="r">n <- length(txtfiles) | |||
i = 1 | |||
traces <- txtfiles[1:n] | |||
headers <- logfiles[1:n] | |||
casefiles <- data.frame(traces, headers) | |||
# while(i<n+1){ get traces[i] | |||
# extract first any 6 characters at beginning of string in txt files | |||
# (traces): '^.{6}' | |||
test <- regexpr("^.{6}", traces) | |||
test2 <- regmatches(traces, test) | |||
print(test2) | |||
</code></pre> | |||
<pre><code>## [1] "2013-1" "2013-1" "233L0B" "419L76" "mydata" | |||
</code></pre> | |||
<pre><code class="r"> | |||
# look for a match in logfiles (headers) | |||
test3 <- regexpr("^.{6}", headers) | |||
test4 <- regmatches(headers, test3) | |||
print(test4) | |||
</code></pre> | |||
<pre><code>## [1] "419L76" "texput" | |||
</code></pre> | |||
<pre><code class="r"> | |||
# i=i+1 } | |||
</code></pre> | |||
<p>grab first of test2, and move down through test4 until I find a match, when I do, write the pair into casefiles$traces, casefiles$headers - but need to add the parts of the strings that I stripped so need to grab the filenames not from test 2 and test4 but rather from the actual full file names | |||
that are stored in traces and headers (using the i, j for location). PErhaps I can do the regexpr, regmatch on the individual rather than creating a new vector? write txtfiles[i] onto casefiles$txt and casefiles$log</p> | |||
</body> | |||
</html> | |||
==Andy== | ==Andy== | ||
*Enter content here | *Enter content here |
Revision as of 03:14, 6 November 2013
Hearing development in barn owls | <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page <html><img src="/images/c/c3/Resultset_previous.png" border="0" /></html>Previous entry<html> </html>Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html> |
General Entries
Personal EntriesFabiana
Trying to organise file managementNeed to:
<html> <body> <h1>Trying to organise file management</h1> <p>1) Change directory to case # 2) get drectory list 3) subset files with log onto one column 3) subset files with txt onto a second column</p> <p>Structure of files is different for different case numbers:</p> <p>Owl189 -> 189###.LOG, 189###.TXT (99 objects: 98 files + WS_FTP.log)<br> Owl222 -> 222###.LOG, 222###.TXT (31 objects: 30 files + WS_FTP.log)<br> Owl224 -> 224###.LOG, 224###.TXT (39 objects: 38 files + WS_FTP.log)<br> Owl229 -> 229###.LOG, 229###.TXT (34 objects: 33 files + WS_FTP.log) <br> Owl230 -> 230###.LOG, 230###.txt (333 objects, ????)<br> Owl233 -> 233###.ABR.log, 233###,ABR.txt(311 objects)<br> Owl335 -> 335###.ABR.log, 335###.ABR.txt (172 objects)<br> Owl336 -> 336###.abr.log, 336###.abr.txt(20 objects)<br> Owl416 -> 416###.abr.log, 416###.abr.txt(358 objects)<br> Owl419 -> 419###.abr.log, 419###.abr.txt (396 objects)<br></p> <p>Missing txt files in <br> Owl 189 (49 log, 49 txt)<br> Owl 222 (15 log, 15 txt, one txt too small size)<br> Owl 224 (19 log, 19 txt, one txt too small size)<br> Owl 229 (18 log, 15 txt)<br> Owl 230 (several 0 KB files and several small txt files) Folder 230(check has 206 files?)<br> Owl 233 (several 0KB files and small txt files) (folder 233(P)only has 64 objects)<br> Owl 335 (several 0KB files and small txt files)<br> Owll 336 (10 log files, 10 txt)<br> Owl 416 (several empty files)<br> Owll 419 (several empty files)<br></p> <p>testing with Owl222</p> <pre><code class="r"># basedir<- getwd() enter case to analyse: newdir <- readline('enter case
</code></pre> <p>or</p> <pre><code class="r"># dir.create(file.path(basedir, casedir), showWarnings = FALSE)
</code></pre> <p>1) Need to get files list<br> 2) Need to separate files as [whatever].log in one column and [whatever].log in another. <br> 3) Somehow I need to know if whatever on the same line do not match. </p> <p>Back to testing on folder Owl222<br> Owl222 -> 222###.LOG, 222###.TXT (31 objects: 30 files + WS_FTP.log)</p> <pre><code class="r">files <- dir() head(files) </code></pre> <pre><code>## [1] "189L0A.ABR" "2013-10-27-MFK.Rmd" "2013-10-27.html"
</code></pre> <pre><code class="r"># need to separate the txt from the logfiles log <- regexpr("(.*)[L|l][O|o][G|g]", files) logfiles <- regmatches(files, log) length(logfiles) </code></pre> <pre><code>## [1] 2 </code></pre> <pre><code class="r">print(logfiles) </code></pre> <pre><code>## [1] "419L76.ABR.log" "texput.log" </code></pre> <pre><code class="r"> txt <- regexpr("(.*)[T|t][X|x][T|t]", files) txtfiles <- regmatches(files, txt) length(txtfiles) </code></pre> <pre><code>## [1] 5 </code></pre> <pre><code class="r">print(txtfiles) </code></pre> <pre><code>## [1] "2013-10-27.txt" "2013-11-05-b-MFK.txt" "233L0B.ABR.txt"
</code></pre> <p>Now need to put those into a single data frame, but make sure that the file names are matched for log and txt. So I am trying to compare the first 6 characters for each</p> <p>I can assume that if I do not have a txt file, it is irrelevant whether I have a log file or not - so can step through the txtfiles line by line and look for the match on the logfile and then dump that on a dataframe where column 1 is txt files and column 2 is logfiles and if a log file is missing, then I can put a NaN</p> <pre><code class="r">n <- length(txtfiles) i = 1 traces <- txtfiles[1:n] headers <- logfiles[1:n] casefiles <- data.frame(traces, headers)
test <- regexpr("^.{6}", traces) test2 <- regmatches(traces, test) print(test2) </code></pre> <pre><code>## [1] "2013-1" "2013-1" "233L0B" "419L76" "mydata" </code></pre> <pre><code class="r">
test3 <- regexpr("^.{6}", headers) test4 <- regmatches(headers, test3) print(test4) </code></pre> <pre><code>## [1] "419L76" "texput" </code></pre> <pre><code class="r">
</code></pre> <p>grab first of test2, and move down through test4 until I find a match, when I do, write the pair into casefiles$traces, casefiles$headers - but need to add the parts of the strings that I stripped so need to grab the filenames not from test 2 and test4 but rather from the actual full file names that are stored in traces and headers (using the i, j for location). PErhaps I can do the regexpr, regmatch on the individual rather than creating a new vector? write txtfiles[i] onto casefiles$txt and casefiles$log</p> </body> </html> Andy
Oris
|