Archiv der Kategorie: Programming

Using trigonometric functions in R

R uses radiant as input for trigonometric functions.

Now we can plot the function.

And by playing with the functions we get a funny graphic output.


And if we include the tangent, the graphic looks like this:

 

Audio file conversion with afconvert (mac)

I was looking for a simple and elegant way to convert a high amount of audio files from one format (.caf) to another (.aif). The solution i found is a very elegant one and also comes included with your operating system – if using a MAC.

And now here is the most amazing part. It is super easy to execute the conversion of multiple files by just one command line.

or to run through subdirectories:

or with recursion by using find:


Key linear PCM format
LE Little Endian
BE Big Endian
F Floating point
I Integer
UI Unsigned integer
8/16/24/32/64 Number of bits

Number of bits Information Size
8 256
16 65536
24 16777216
32 4294967296
64 18446744073709551616

Audio file and data formats: data_formats:
‘3gpp’ = 3GP Audio (.3gp) ‘Qclp’ ‘aac ‘ ‘aace’ ‘aach’ ‘aacl’ ‘aacp’ ‘samr’
‘3gp2’ = 3GPP-2 Audio (.3g2) Qclp’ ‘aac ‘ ‘aace’ ‘aach’ ‘aacl’ ‘aacp’ ‘samr’
‘adts’ = AAC ADTS (.aac, .adts) ‘aac ‘ ‘aach’ ‘aacp’
‘ac-3’ = AC3 (.ac3) ‘ac-3’
‘AIFC’ = AIFC (.aifc, .aiff, .aif) I8 BEI16 BEI24 BEI32 BEF32 BEF64 UI8 ‘ulaw’ ‘alaw’ ‘MAC3’ ‘MAC6’ ‘ima4’ ‘QDMC’ ‘QDM2’ ‘Qclp’ ‘agsm’
‘AIFF’ = AIFF (.aiff, .aif) I8 BEI16 BEI24 BEI32
‘amrf’ = AMR (.amr) ‘samr’
‘m4af’ = Apple MPEG-4 Audio (.m4a, .m4r) ‘aac ‘ ‘aace’ ‘aach’ ‘aacl’ ‘aacp’ ‘alac’
‘caff’ = CAF (.caf) ‘.mp1’ ‘.mp2’ ‘.mp3’ ‘QDM2’ ‘QDMC’ ‘Qclp’ ‘Qclq’ ‘aac ‘ ‘aace’ ‘aach’ ‘aacl’ ‘aacp’ ‘alac’ ‘alaw’ ‘dvi8’ ‘ilbc’ ‘ima4’ I8 BEI16 BEI24 BEI32 BEF32 BEF64 LEI16 LEI24 LEI32 LEF32 LEF64 ‘ms\x00\x02’ ‘ms\x00\x11’ ‘ms\x001’ ‘paac’ ‘samr’ ‘ulaw’
‘MPG1’ = MPEG Layer 1 (.mp1, .mpeg, .mpa) ‘.mp1’
‘MPG2’ = MPEG Layer 2 (.mp2, .mpeg, .mpa) ‘.mp2’
‘MPG3’ = MPEG Layer 3 (.mp3, .mpeg, .mpa) ‘.mp3’
‘mp4f’ = MPEG-4 Audio (.mp4) data_formats: ‘aac ‘ ‘aace’ ‘aach’ ‘aacl’ ‘aacp’
‘NeXT’ = NeXT/Sun (.snd, .au) I8 BEI16 BEI24 BEI32 BEF32 BEF64 ‘ulaw’
‘Sd2f’ = Sound Designer II (.sd2) I8 BEI16 BEI24 BEI32
‘WAVE’ = WAVE (.wav) UI8 LEI16 LEI24 LEI32 LEF32 LEF64 ‘ulaw’ ‘alaw’

Creating a beat frequency interference with R

A beat frequency is a mix of two frequencies which are very close to each other but not similar. The trick is that they are to close to each other to be separated by the human ear as two distinct frequencies, thus generating  a single tone with fluctuating amplitude behavior – a periodic change in volume. In Fact this effect just appears within the human brain, therefore the two tones can be measured physically by using the appropriate instruments. Further more the effect also works in a binaural situation where one ear can only hear one frequency respectively.

The following graphic shows two almost similar sinus waves, one at 440 Hz and one slightly below, at 435 Hz. The sound data is produced for exactly 2 seconds of time at a 44100 Hz sample rate, giving us 88200 sample points for 2 seconds. The first three demonstrations of the graph show only the beginning of the wave whereas the last one presents the combination of both signals for the complete 2 seconds.

435Hz-440Hz-beatsfrequency-4plots

Basically a combination of two sinus waves can be mathematically represented by:

f3And if we assume that both amplitudes are the same we get the reduced form by:

f2

It is interesting to understand that the resulting frequency of the beat, i.e. the recognized periodic fluctuation of volume, is given by:

f7

 

440Hz – Sinus – 2 seconds

435Hz – Sinus – 2 seconds

435 & 440Hz – Sinus – resulting beat frequency – 2 seconds

The oscillations in this post are simple created in R by using standard mathematical functions in combination with the time series package in R. In addition the seewave package is used to store the sinus waves as a .wav file to the system.

The time series package handles data as equispaced points in time. This is in accordance with the sampling of continuous sound signals as the become digitized. A common used sampling frequency for CD quality is 44.1 kHz which results in 88.2k sample points for a length of 2 seconds.

For ease of use the summation of the amplitude 2a becomes reduced to a by division.

The graphical representation of the sound can easily be saved as a .jpg file to the system.

In addition to the sample above we can also see and hear what it is like when the beat effect fades out and the brain starts to recognize two different tones. Therefore the next few examples present the resulting wave after summing two different frequencies, where one is always 440 Hz.

beatsfrequency-9examples

 

435 & 440Hz – Sinus – resulting beat frequency – 2 seconds

425 & 440Hz – Sinus – resulting (beat) frequency – 2 seconds

415 & 440Hz – Sinus – resulting (beat) frequency – 2 seconds

405 & 440Hz – Sinus – resulting (beat) frequency – 2 seconds

395 & 440Hz – Sinus – resulting (beat) frequency – 2 seconds

485 & 440Hz – Sinus – resulting (beat) frequency – 2 seconds


  • http://cran.r-project.org/web/views/TimeSeries.html
  • http://cran.r-project.org/web/packages/seewave/index.html

Latex code for the Formulas above:

Syndication feed reader using the Project Rome API – (1_FeedReader.java)


Implementing a MySQL DB connection via ooRexx by using BSF4ooRexx – (3_MySql_connector.rxj)


 

ooRexx with BSF4ooRexx – “java,net.URL” Classes (2_getinfo.rxj)


Syndication Feed Reader (1_Read.rxj) – ooRexx with BSF4ooRexx


Analyzing the World GDP Development using R

This is just a simple example of how to download and visualize data from the web by using the R-Project framework. Specifically data tables including GDP values from the world-bank data section are used which include absolute GDP per country data from 1980 to 2012.

http://data.worldbank.org/indicator/NY.GDP.MKTP.CD/countries/1W?display=default

There are 7 tables including 4 – 5 years of GDP data each. What this script does is to download each of this tables and merge it together into a single data frame which makes the data easily accessible for further analysis.

If we simply sum up all the single absolute GDP values per year we get the world GDP values for the period 1980 – 2012. The graphic below shows this development, measured in billions of USD.

World GDP 1980 -2012

World GDP 1980 -2012

 
Its easy to see that there are two periods of stronger growth as well as a period of stagnation around 1996 – 2002, which might be explained trough the asia financial crisis in 1997/98 and the dot com bubble turmoil. But it is interesting that, as compared to the impact of the global financial crisis in 2007/08, there was no severe setback in the world GDP output in this earlier period. 
The impact of the subprime crisis driven, globaly spread, financial crisis in 2007 – 2008 is easily observable in the graphic above, though it comes with an expected time delay of one year. The global output value for the year 2007/08 was still growing while the financial downturn was already spreading on a global level (this can be seen by taking a look at different major stock market indices around the world – which i will try to show in an other post). The setback in GDP output came one year later in 2009, where we can see a strong setback in the plot, but with a recovery in output another year later in 2010.
Here is an other example of a simple output for the seven wealthiest countries in the world, measured by their absolute GDP value, from 1980 to 2012.
GDP top 7 countries in 1980 - 2012

GDP top 7 countries in 2012

The similarity of the US output compared to the world GDP output is very obvious at first sight, showing the same setback in 2009 but no stagnation trough the 1998 – 2002 period. The fact that there is no setback in GDP output trough that period might indicate a week relationship between the financial turmoil in that time and the real economic output.

An other interesting fact is the growth of Chinas output, especially from 2002 onwards. This opens up the question on further resaerch on the reasons for this rapid growth and if some kind of regime switch occured turing that time . This rapid growth change lets China take the worlds second place, measured inabsolute GDP output, which makes China the second riches nation in the world, measured in absolute GDP value, from 2009 onwards.

The last graphic is for the 8-14 ranked countries, measured by their absolute GDP USD value.

GDP top 8 - 14 Countries in 2012

GDP top 8 – 14 Countries in 2012

 

Ok, so here is the code:

Basically i use two packages for these simple analytics. The “XML” packages is used to gather the data from the worldbank database and the “gridExtra” package is used for the graphical representation of a given dataset.  The rest of the functionality already comes with the standard installation of the r-project software package.

The first step in the process is to download all the available data sets from the worldbank database and merge it together into one single data frame. This provides us with the opportunity of easy data handling. (Straight forward i just downloaded all the sets of data. Of course it would be possible to compress and automate this process by using loops, … , but i will try to cover this possibility at an other point here. As well as the automated gathering of data from different web sources, such as worldbank data, imf data, yahoo data, google data, quandl data, …)

Once we gathered all the GDP data in one frame it is necessary to transform and clean some of the data. This is due to the fact that some of the cells in the data frame have no values (i.e. NA values in R) or they are in the wrong data formate (e.g. a character instead of numeric)

Basically we replace all NA values with a numerical 0, set all entries to numerical values and add the column names, which are the respective years of the data series, to the data frame.

The results from this process can the seen in the plot below, which is a presentation of a part of the whole compiled table.

Output 1:

World GDP Distribution

World GDP Data by Country

The next step is the graphical representation of the data which is the main goal of this short research. Therefore we apply the transpose function for the data frame and use the “xts” framework for time series handling and graphical output.

For the first plot (the cumulative world GDP) we have to process an other step before we can present the data. The single values for each country have to be summed up for each year to get the value for the world GDP of that specific year.

Output 2:

World GDP 1980 -2012

World GDP 1980 -2012

Plots for the top 7 countries measured in GDP value:

Output 3:

GDP top 7 countries in 2012

GDP top 7 countries in 2012

Plots for the top 8-14 countries measured in GDP value:

Output 4:

GDP top 8 - 14 Countries in 2012

GDP top 8 – 14 Countries in 2012

– – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –

Please note that i do not always try to produce perfectly efficient code in a programming attitude but rather focus on the results and sociological, econometrical, … , insights gathered trough the process of data analytics.

If my focus especially belongs to coding i will try to catch that with an unique post, just for this specific topic. Nevertheless i strongly appreciate suggestions, additions and/or correction in both the analytical insights and the data analytical process and programming code.

– – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –

Data Source:  data.worldbank.org/

Software: www.rproject.org/
– – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –
Martin Stoppacher,  © 2014
 

An example of a syndication feed reader using the Project Rome API with BSF4ooRexx – “java.net.URL” feed reader (2_Rome.rxj)