Kubke Lab:Research/ABR/Notebook/2013/11/06: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
(Autocreate 2013/11/06 Entry for Kubke_Lab:Research/ABR/Notebook)
 
Line 13: Line 13:
=Personal Entries=
=Personal Entries=
==Fabiana==
==Fabiana==
*Enter content here
*From file 2013-11-06-MFK.Rmd in Sandbox
===Trying to organise file management===
Need to:
* open the folder, grab the list of files
* separate log files from text files
* put them in a data.frame with one column having txt files and another having log files
* make sure that the names of txt and log files actually match, and put NA where one of the file pairs is missing
 
<html>
<body>
<h1>Trying to organise file management</h1>
 
<p>1) Change directory to case #
2) get drectory list
3) subset files with log onto one column
3) subset files with txt onto a second column</p>
 
<p>Structure of files is different for different case numbers:</p>
 
<p>Owl189 -&gt; 189###.LOG, 189###.TXT (99 objects: 98 files + WS_FTP.log)<br>
Owl222 -&gt; 222###.LOG, 222###.TXT (31 objects: 30 files + WS_FTP.log)<br>
Owl224 -&gt; 224###.LOG, 224###.TXT (39 objects: 38 files + WS_FTP.log)<br>
Owl229 -&gt; 229###.LOG, 229###.TXT (34 objects: 33 files + WS_FTP.log) <br>
Owl230 -&gt; 230###.LOG, 230###.txt (333 objects, ????)<br>
Owl233 -&gt; 233###.ABR.log, 233###,ABR.txt(311 objects)<br>
Owl335 -&gt; 335###.ABR.log, 335###.ABR.txt (172 objects)<br>
Owl336 -&gt; 336###.abr.log, 336###.abr.txt(20 objects)<br>
Owl416 -&gt; 416###.abr.log, 416###.abr.txt(358 objects)<br>
Owl419 -&gt; 419###.abr.log, 419###.abr.txt (396 objects)<br></p>
 
<p>Missing txt files in <br>
Owl 189 (49 log, 49 txt)<br>
Owl 222 (15 log, 15 txt, one txt too small size)<br>
Owl 224 (19 log, 19 txt, one txt too small size)<br>
Owl 229 (18 log, 15 txt)<br>
Owl 230 (several 0 KB files and several small txt files) Folder 230(check has 206 files?)<br>
Owl 233 (several 0KB files and small txt files) (folder 233(P)only has 64 objects)<br>
Owl 335 (several 0KB files and small txt files)<br>
Owll 336 (10 log files, 10 txt)<br>
Owl 416 (several empty files)<br>
Owll 419 (several empty files)<br></p>
 
<p>testing with Owl222</p>
 
<pre><code class="r"># basedir&lt;- getwd() enter case to analyse: newdir &lt;- readline(&#39;enter case
# number: &#39;) create dir name as basedir\Analysis\datafiles\case: basedir
# &lt;- getwd() casedir &lt;- paste(basedir, newdir, sep = &#39;/&#39;) setwd(casedir)
</code></pre>
 
<p>or</p>
 
<pre><code class="r"># dir.create(file.path(basedir, casedir), showWarnings = FALSE)
# setwd(file.path(basedir, casedir))
</code></pre>
 
<p>1) Need to get files list<br>
2) Need to separate files as [whatever].log in one column and [whatever].log in another. <br>
3) Somehow I need to know if whatever on the same line do not match. </p>
 
<p>Back to testing on folder Owl222<br>
Owl222 -&gt; 222###.LOG, 222###.TXT (31 objects: 30 files + WS_FTP.log)</p>
 
<pre><code class="r">files &lt;- dir()
head(files)
</code></pre>
 
<pre><code>## [1] &quot;189L0A.ABR&quot;          &quot;2013-10-27-MFK.Rmd&quot;  &quot;2013-10-27.html&quot;   
## [4] &quot;2013-10-27.R&quot;        &quot;2013-10-27.txt&quot;      &quot;2013-10-28-MFK.html&quot;
</code></pre>
 
<pre><code class="r"># need to separate the txt from the logfiles
log &lt;- regexpr(&quot;(.*)[L|l][O|o][G|g]&quot;, files)
logfiles &lt;- regmatches(files, log)
length(logfiles)
</code></pre>
 
<pre><code>## [1] 2
</code></pre>
 
<pre><code class="r">print(logfiles)
</code></pre>
 
<pre><code>## [1] &quot;419L76.ABR.log&quot; &quot;texput.log&quot;
</code></pre>
 
<pre><code class="r">
txt &lt;- regexpr(&quot;(.*)[T|t][X|x][T|t]&quot;, files)
txtfiles &lt;- regmatches(files, txt)
length(txtfiles)
</code></pre>
 
<pre><code>## [1] 5
</code></pre>
 
<pre><code class="r">print(txtfiles)
</code></pre>
 
<pre><code>## [1] &quot;2013-10-27.txt&quot;      &quot;2013-11-05-b-MFK.txt&quot; &quot;233L0B.ABR.txt&quot;     
## [4] &quot;419L76.ABR.txt&quot;      &quot;mydata_new.txt&quot;
</code></pre>
 
<p>Now need to put those into a single data frame, but make sure that the file names are matched for log and txt. So I am trying to compare the first 6 characters for each</p>
 
<p>I can assume that if I do not have a txt file, it is irrelevant whether I have a log file or not - so can step through the txtfiles line by line and look for the match on the logfile and then dump that on a dataframe where column 1 is txt files and column 2 is logfiles and if a log file is missing, then I can put a NaN</p>
 
<pre><code class="r">n &lt;- length(txtfiles)
i = 1
traces &lt;- txtfiles[1:n]
headers &lt;- logfiles[1:n]
casefiles &lt;- data.frame(traces, headers)
 
# while(i&lt;n+1){ get traces[i]
 
# extract first any 6 characters at beginning of string in txt files
# (traces): &#39;^.{6}&#39;
test &lt;- regexpr(&quot;^.{6}&quot;, traces)
test2 &lt;- regmatches(traces, test)
print(test2)
</code></pre>
 
<pre><code>## [1] &quot;2013-1&quot; &quot;2013-1&quot; &quot;233L0B&quot; &quot;419L76&quot; &quot;mydata&quot;
</code></pre>
 
<pre><code class="r">
# look for a match in logfiles (headers)
test3 &lt;- regexpr(&quot;^.{6}&quot;, headers)
test4 &lt;- regmatches(headers, test3)
print(test4)
</code></pre>
 
<pre><code>## [1] &quot;419L76&quot; &quot;texput&quot;
</code></pre>
 
<pre><code class="r">
# i=i+1 }
</code></pre>
 
<p>grab first of test2, and move down through test4 until I find a match, when I do, write the pair into casefiles$traces, casefiles$headers - but need to add the parts of the strings that I stripped so need to grab the filenames not from test 2 and test4 but rather from the actual full file names
that are stored in traces and headers (using the i, j for location). PErhaps I can do the regexpr, regmatch on the individual rather than creating a new vector? write txtfiles[i] onto casefiles$txt and casefiles$log</p>
 
</body>
 
</html>
 
==Andy==
==Andy==
*Enter content here
*Enter content here

Revision as of 03:14, 6 November 2013

Hearing development in barn owls <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
<html><img src="/images/c/c3/Resultset_previous.png" border="0" /></html>Previous entry<html>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</html>Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html>

General Entries

  • Insert content here...

Personal Entries

Fabiana

  • From file 2013-11-06-MFK.Rmd in Sandbox

Trying to organise file management

Need to:

  • open the folder, grab the list of files
  • separate log files from text files
  • put them in a data.frame with one column having txt files and another having log files
  • make sure that the names of txt and log files actually match, and put NA where one of the file pairs is missing

<html> <body> <h1>Trying to organise file management</h1>

<p>1) Change directory to case # 2) get drectory list 3) subset files with log onto one column 3) subset files with txt onto a second column</p>

<p>Structure of files is different for different case numbers:</p>

<p>Owl189 -&gt; 189###.LOG, 189###.TXT (99 objects: 98 files + WS_FTP.log)<br> Owl222 -&gt; 222###.LOG, 222###.TXT (31 objects: 30 files + WS_FTP.log)<br> Owl224 -&gt; 224###.LOG, 224###.TXT (39 objects: 38 files + WS_FTP.log)<br> Owl229 -&gt; 229###.LOG, 229###.TXT (34 objects: 33 files + WS_FTP.log) <br> Owl230 -&gt; 230###.LOG, 230###.txt (333 objects, ????)<br> Owl233 -&gt; 233###.ABR.log, 233###,ABR.txt(311 objects)<br> Owl335 -&gt; 335###.ABR.log, 335###.ABR.txt (172 objects)<br> Owl336 -&gt; 336###.abr.log, 336###.abr.txt(20 objects)<br> Owl416 -&gt; 416###.abr.log, 416###.abr.txt(358 objects)<br> Owl419 -&gt; 419###.abr.log, 419###.abr.txt (396 objects)<br></p>

<p>Missing txt files in <br> Owl 189 (49 log, 49 txt)<br> Owl 222 (15 log, 15 txt, one txt too small size)<br> Owl 224 (19 log, 19 txt, one txt too small size)<br> Owl 229 (18 log, 15 txt)<br> Owl 230 (several 0 KB files and several small txt files) Folder 230(check has 206 files?)<br> Owl 233 (several 0KB files and small txt files) (folder 233(P)only has 64 objects)<br> Owl 335 (several 0KB files and small txt files)<br> Owll 336 (10 log files, 10 txt)<br> Owl 416 (several empty files)<br> Owll 419 (several empty files)<br></p>

<p>testing with Owl222</p>

<pre><code class="r"># basedir&lt;- getwd() enter case to analyse: newdir &lt;- readline(&#39;enter case

  1. number: &#39;) create dir name as basedir\Analysis\datafiles\case: basedir
  2. &lt;- getwd() casedir &lt;- paste(basedir, newdir, sep = &#39;/&#39;) setwd(casedir)

</code></pre>

<p>or</p>

<pre><code class="r"># dir.create(file.path(basedir, casedir), showWarnings = FALSE)

  1. setwd(file.path(basedir, casedir))

</code></pre>

<p>1) Need to get files list<br> 2) Need to separate files as [whatever].log in one column and [whatever].log in another. <br> 3) Somehow I need to know if whatever on the same line do not match. </p>

<p>Back to testing on folder Owl222<br> Owl222 -&gt; 222###.LOG, 222###.TXT (31 objects: 30 files + WS_FTP.log)</p>

<pre><code class="r">files &lt;- dir() head(files) </code></pre>

<pre><code>## [1] &quot;189L0A.ABR&quot; &quot;2013-10-27-MFK.Rmd&quot; &quot;2013-10-27.html&quot;

    1. [4] &quot;2013-10-27.R&quot; &quot;2013-10-27.txt&quot; &quot;2013-10-28-MFK.html&quot;

</code></pre>

<pre><code class="r"># need to separate the txt from the logfiles log &lt;- regexpr(&quot;(.*)[L|l][O|o][G|g]&quot;, files) logfiles &lt;- regmatches(files, log) length(logfiles) </code></pre>

<pre><code>## [1] 2 </code></pre>

<pre><code class="r">print(logfiles) </code></pre>

<pre><code>## [1] &quot;419L76.ABR.log&quot; &quot;texput.log&quot; </code></pre>

<pre><code class="r"> txt &lt;- regexpr(&quot;(.*)[T|t][X|x][T|t]&quot;, files) txtfiles &lt;- regmatches(files, txt) length(txtfiles) </code></pre>

<pre><code>## [1] 5 </code></pre>

<pre><code class="r">print(txtfiles) </code></pre>

<pre><code>## [1] &quot;2013-10-27.txt&quot; &quot;2013-11-05-b-MFK.txt&quot; &quot;233L0B.ABR.txt&quot;

    1. [4] &quot;419L76.ABR.txt&quot; &quot;mydata_new.txt&quot;

</code></pre>

<p>Now need to put those into a single data frame, but make sure that the file names are matched for log and txt. So I am trying to compare the first 6 characters for each</p>

<p>I can assume that if I do not have a txt file, it is irrelevant whether I have a log file or not - so can step through the txtfiles line by line and look for the match on the logfile and then dump that on a dataframe where column 1 is txt files and column 2 is logfiles and if a log file is missing, then I can put a NaN</p>

<pre><code class="r">n &lt;- length(txtfiles) i = 1 traces &lt;- txtfiles[1:n] headers &lt;- logfiles[1:n] casefiles &lt;- data.frame(traces, headers)

  1. while(i&lt;n+1){ get traces[i]
  1. extract first any 6 characters at beginning of string in txt files
  2. (traces): &#39;^.{6}&#39;

test &lt;- regexpr(&quot;^.{6}&quot;, traces) test2 &lt;- regmatches(traces, test) print(test2) </code></pre>

<pre><code>## [1] &quot;2013-1&quot; &quot;2013-1&quot; &quot;233L0B&quot; &quot;419L76&quot; &quot;mydata&quot; </code></pre>

<pre><code class="r">

  1. look for a match in logfiles (headers)

test3 &lt;- regexpr(&quot;^.{6}&quot;, headers) test4 &lt;- regmatches(headers, test3) print(test4) </code></pre>

<pre><code>## [1] &quot;419L76&quot; &quot;texput&quot; </code></pre>

<pre><code class="r">

  1. i=i+1 }

</code></pre>

<p>grab first of test2, and move down through test4 until I find a match, when I do, write the pair into casefiles$traces, casefiles$headers - but need to add the parts of the strings that I stripped so need to grab the filenames not from test 2 and test4 but rather from the actual full file names that are stored in traces and headers (using the i, j for location). PErhaps I can do the regexpr, regmatch on the individual rather than creating a new vector? write txtfiles[i] onto casefiles$txt and casefiles$log</p>

</body>

</html>

Andy

  • Enter content here

Oris

  • Enter content here