tag:blog.dancrisan.com,2014:/feedDan Crisan2016-11-22T22:40:40-08:00Dan Crisanhttp://blog.dancrisan.comSvbtle.comtag:blog.dancrisan.com,2014:Post/a-tiny-intro-to-publishing-your-app-on-google-play2016-11-22T22:40:40-08:002016-11-22T22:40:40-08:00A Tiny Intro to Publishing Your Android App on Google Play<p>The following is a simple technical guide to help developers to publish their Android app on <a href="https://play.google.com">Google Play</a>. </p>
<p>Here is what you need :</p>
<ul>
<li>A Google Play publisher account (which is mainly an Android developper account). You can sign up from <a href="https://play.google.com/apps/publish/signup/">here</a>
</li>
<li>Android Studio in order to generate your .apk file (the Android Application Package is a package file format used to distribute and install apps on Android)</li>
</ul>
<p>If you are ready to publish, let’s see how to generate your apk. In Android Studio, access “<em>Build</em>” and then “<em>Build apk</em>” from the top menu : </p>
<p><a href="https://svbtleusercontent.com/motrpvwpcokta.png"><img src="https://svbtleusercontent.com/motrpvwpcokta_small.png" alt="Screen Shot 2016-11-22 at 12.18.20.png"></a></p>
<p>Your application folder now contains a file called <em>app-debug.apk</em> :</p>
<p><a href="https://svbtleusercontent.com/gnk9daqelmbamw.png"><img src="https://svbtleusercontent.com/gnk9daqelmbamw_small.png" alt="Screen Shot 2016-11-22 at 12.29.12.png"></a></p>
<p>From the command line, change directory to where your <em>app-debug.apk</em> file is located. Once there, we need to generate a key that will help us signing the apk. In the command line terminal, type the following :</p>
<p><strong>keytool -genkey -v -keystore my-release-key.keystore -alias alias_name -keyalg RSA -keysize 2048 -validity 10000</strong></p>
<p>Create a password for the keystore (mandatory) and on the last question, type yes :</p>
<p><a href="https://svbtleusercontent.com/yxd27oun0v8law.png"><img src="https://svbtleusercontent.com/yxd27oun0v8law_small.png" alt="Screen Shot 2016-11-22 at 12.36.45.png"></a></p>
<p>We notice a new file created in our folder, the keystore file : </p>
<p><a href="https://svbtleusercontent.com/pbruldpivrpkiw.png"><img src="https://svbtleusercontent.com/pbruldpivrpkiw_small.png" alt="Screen Shot 2016-11-22 at 12.43.30.png"></a></p>
<p>We’ll use the keystore file to sign the application by typing the following command:</p>
<p><strong>jarsigner -verbose -sigalg SHA1withRSA -digestalg SHA1 -keystore my-release-key.keystore app-debug.apk alias_name</strong></p>
<p>The next step is optimizing the apk. The result is having the app use less RAM when running. We’ll use the <em>zipalign</em> tool. It’s usually found under <em>/path/to/Android/sdk/build-tools/VERSION/zipalign</em>. On a mac, it is under <em>~/Library/Android/sdk/build-tools/VERSION/zipalign</em> : </p>
<p><strong>~/Library/Android/sdk/build-tools/yourVersionOfAndroid/zipalign -v 4 app-debug.apk new-optimized.apk</strong></p>
<p><a href="https://svbtleusercontent.com/nrd9lipej0rtmg.png"><img src="https://svbtleusercontent.com/nrd9lipej0rtmg_small.png" alt="Screen Shot 2016-11-22 at 13.08.13.png"></a></p>
<p>We notice a new file created in our folder, the new-optimized apk .</p>
<p><a href="https://svbtleusercontent.com/lynhiulkxo0dya.png"><img src="https://svbtleusercontent.com/lynhiulkxo0dya_small.png" alt="Screen Shot 2016-11-22 at 13.51.36.png"></a></p>
<p>Done! Now you can now go to <a href="https://play.google.com/apps/publish/">https://play.google.com/apps/publish/</a> and upload your apk. </p>
<p><a href="https://svbtleusercontent.com/x3txptm6sbizra.png"><img src="https://svbtleusercontent.com/x3txptm6sbizra_small.png" alt="Screen Shot 2016-11-22 at 13.49.50.png"></a></p>
<p>Once it got accepted, you can later publish new versions of your app. You’ll have to change its version in the manifest.xml file before submitting it (notice lines 3 and 4)</p>
<p><a href="https://svbtleusercontent.com/gnbyvhxhakfy6g.png"><img src="https://svbtleusercontent.com/gnbyvhxhakfy6g_small.png" alt="Screen Shot 2016-11-22 at 21.31.03.png"></a></p>
<p>Then, you can simply resign the apk:</p>
<p><strong>jarsigner -verbose -sigalg SHA1withRSA -digestalg SHA1 -keystore my-release-key.keystore app-debug.apk alias_name</strong></p>
<p>Optimize it once more : </p>
<p><strong>~/Library/Android/sdk/build-tools/yourVersionOfAndroid/zipalign -v 4 app-debug.apk new-optimized.apk</strong></p>
<p>And then from the left menu “<em>APK</em>”, “<em>Upload new APK to Production</em>” and then upload: </p>
<p><a href="https://svbtleusercontent.com/yausha0qfyuuw.png"><img src="https://svbtleusercontent.com/yausha0qfyuuw_small.png" alt="Screen Shot 2016-11-22 at 22.05.27.png"></a></p>
<p>Done! </p>
<p>If any questions or suggestions, send me a tweet at <a href="https://twitter.com/@dandancrisan">@dandancrisan</a> or let’s go for coffee in SF or Montreal . Thanks for reading and I hope it helped ! (: </p>
tag:blog.dancrisan.com,2014:Post/tiny-intros2015-12-19T19:08:08-08:002015-12-19T19:08:08-08:00Tiny Intros <p>Here’s what is in progress and what has been covered until now:</p>
<ul>
<li><a href="http://blog.dancrisan.com/a-tiny-intro-to-database-systems">A Tiny Intro to Database Systems</a></li>
<li><a href="http://blog.dancrisan.com/a-tiny-intro-to-android-activities-lifecycle">A Tiny Intro to the Android Activity Lifecycle</a></li>
<li><a href="https://medium.com/@dancrisan/a-tiny-intro-to-github-c4cc653bb64e#.vrv4fr15c">A Tiny Intro to GitHub</a></li>
</ul>
<p>Thank you for reading. If any suggestions, feel free to send me a tweet <a href="https://twitter.com/dandancrisan">@dandancrisan</a></p>
tag:blog.dancrisan.com,2014:Post/a-tiny-intro-to-android-activities-lifecycle2015-11-03T20:43:38-08:002015-11-03T20:43:38-08:00A Tiny Intro to the Android Activity Lifecycle<p>The following is a tutorial on the Android Activity Lifecycle, a concept that occurs pretty often during mobile dev interviews</p>
<p>There is also <a href="https://play.google.com/store/apps/details?id=com.imagineAny.hp1.androidLifecycle">an app to demo the topic</a>, simple, a decent way to go through the cycles and process. It is one buck for the support but also <a href="https://github.com/dancrisan/Android_Activity_Lifecycle">open sourced on Github</a> for people who like to learn with examples and more logs (it can be easily tested on Android Studio).</p>
<p>Before going trough the lifecycle process and explaining the main parts, let’s see what is an <strong>Android Activity</strong>. An activity is a view, a window with related design and interactions. The following is a single activity that has 2 buttons and some text describing in which state of the Activity Lifecycle the app is. </p>
<p><a href="https://svbtleusercontent.com/v2hpjs5a8p8qcq.png"><img src="https://svbtleusercontent.com/v2hpjs5a8p8qcq_small.png" alt="Screen Shot 2015-11-03 at 5.34.07 PM.png"></a></p>
<p>This activity, as any Android Activity, goes through the following states all along its lifecycle: </p>
<p><a href="https://svbtleusercontent.com/iwwmp96qimw.png"><img src="https://svbtleusercontent.com/iwwmp96qimw_small.png" alt="lifecycle.png"></a></p>
<p>1) <strong>onCreate()</strong></p>
<p>The <strong>onCreate()</strong> statement is called anytime there is a new instance of the activity created. This means that for any subsequent starts of the activity, the <strong>onCreate()</strong> statement is <strong>not</strong> called because <strong>the activity was already created/loaded</strong>. The <strong>onCreate()</strong> statement is called when…</p>
<ul>
<li>the user loads a new activity for the first time</li>
<li>the user exits the current view by hitting the <strong>Back button</strong> and then returns to it. That’s because when the user presses the <strong>Back button</strong>, the current activity is destroyed and once the user returns to it, it has to be created again. </li>
</ul>
<p>2) <strong>onStart()</strong></p>
<p>The <strong>onStart()</strong> statement is called, you guessed it, when any instance of the activity starts : when the instance loads for the first time, but also when there are any subsequents reloads, basically anytime an activity shows up to the foreground. This is the state when the application becomes <em>visible</em> to the user. </p>
<p>3) <strong>onResume()</strong></p>
<p>The <strong>onResume()</strong> statement is called when the app activity becomes <em>ready to respond</em> to the user, which means anytime except when the activity is <strong>onPause()</strong>. Let’s go to the next step in order to have a better understanding of <strong>onResume</strong>/<strong>onPause()</strong>. </p>
<p>We can test the <strong>onResume()</strong> action by pressing the <strong>Back button</strong> once in the <strong>onPause()</strong> state (click on the <strong>Checkmark button</strong> to pause Main Activity in the Activity Lifecycle app).</p>
<p>4) <strong>onPause()</strong></p>
<p>If the activity is partially visible but somehow not in focus, the activity is <strong>onPause()</strong>. It’s the case with the next example where we have a transparent second activity that partially obstructs our main activity. We can test the <strong>onPause()</strong> action by pressing the <strong>Checkmark button</strong> in the Activity Lifecycle app. </p>
<p><a href="https://svbtleusercontent.com/vrwwlrunoyinca.png"><img src="https://svbtleusercontent.com/vrwwlrunoyinca_small.png" alt="Screen Shot 2015-11-03 at 9.54.59 PM.png"></a></p>
<p>We can stop the paused activity by sending it to the background and come back to it by bringing it back to the foreground: it will still stay in the <strong>onPause()</strong> state. We can test this by pressing either the <strong>Circle button</strong> or the <strong>Square button</strong> of the device while the application is paused, switching to another view from a different app, and then coming back to our Activity Lifecycle app. </p>
<p><a href="https://svbtleusercontent.com/r1faxmtfj0tlea.png"><img src="https://svbtleusercontent.com/r1faxmtfj0tlea_small.png" alt="Screen Shot 2015-11-03 at 9.59.19 PM.png"></a></p>
<p>In order to better understand the <strong>onPause()/onStop()</strong> difference, let’s have a look at how <strong>onStop()</strong> works. </p>
<p>5) <strong>onStop()</strong></p>
<p>The <strong>onStop()</strong> statement is always called when the activity is in the background. But what’s the difference between <strong>onPause()</strong> and <strong>onStop()</strong> ? <strong>onPause()</strong> is the equivalent of a car stoping at a red light. It doesn’t turn off the motor, it still goes on, still driving, but with a lot less resource consumption. That’s different comparatively to a situation where the car is stopped at a grocery store (the motor is turned off), sending the activity to the background (not driving anymore). You can experiment <strong>onStop</strong> by pressing either the <strong>Square button</strong> or the <strong>Circle button</strong>.</p>
<p>6) <strong>onRestart()</strong></p>
<p>The <strong>onRestart()</strong> statement is always called when the activity is about to be displayed on the foreground after being stopped.</p>
<p>7) <strong>onDestroy()</strong></p>
<p>The <strong>onDestroy()</strong> statement is always called when the application is killed, either by the user or by the operating system because it’s low on resources. </p>
<p><strong>One last thing</strong></p>
<p>Now what if we want to run a component in the background (say play music) even if the activity is stopped? It is called <strong>a service</strong>. A service runs in the background without any interaction with the user except the one starting the service and stopping the service. It can act independently of any activity lifecycle, processing continuously in the background. A MediaPlayer for example is a service. </p>
<p>In the Activity Lifecycle app, we can demo the service utility by pressing the <strong>Music button</strong>, sending the app to the background by pressing the <strong>Circle button</strong> and noticing that the music is still playing. </p>
<p>You can press Ctrl + Fn + F6 on a Mac or Ctrl + F6 on a PC to increase the volume. </p>
<p>A service is stopped when the process starting it is destroyed (the app is killed) or when the service is stopped (in the Activity Lifecycle app, press the <strong>Music button</strong> one more time). </p>
<p><a href="https://twitter.com/dandancrisan">@dandancrisan</a></p>
<p><a href="https://play.google.com/store/apps/details?id=com.imagineAny.hp1.androidLifecycle">Mobile app</a></p>
<p><a href="https://github.com/dancrisan/Android_Activity_Lifecycle">Source code</a> </p>
tag:blog.dancrisan.com,2014:Post/a-tiny-intro-to-database-systems2015-04-24T14:11:54-07:002015-04-24T14:11:54-07:00A Tiny Intro to Database Systems<p>Here is a short summary of DBMS : database management systems. </p>
<p>Part of the motivation behind those little chapters is described in another blog post <a href="https://medium.com/@dancrisan/the-intro-to-a-tiny-intro-to-database-systems-hacking-education-through-techblogging-d62e025b7f15">here</a>.</p>
<ul>
<li><a href="http://blog.dancrisan.com/a-quick-intro-to-databases">Introduction</a></li>
<li><a href="http://blog.dancrisan.com/intro-to-database-systems-part-2-the-entityrelationship-model">The Entity-Relationship Model</a></li>
<li><a href="http://blog.dancrisan.com/intro-to-database-systems-part-3-the-relational-model">The Relational Model</a></li>
<li><a href="http://blog.dancrisan.com/intro-to-database-systems-part-4-relational-algebra">Relational Algebra</a></li>
<li><a href="http://blog.dancrisan.com/intro-to-database-systems-part-6-7-basic-sql">Very Basic SQL</a></li>
<li><a href="http://blog.dancrisan.com/intro-to-database-systems-part-8-to-10-intermediate-sql">More Basic SQL</a></li>
<li><a href="http://blog.dancrisan.com/intro-to-database-systems-part-11-to-13-integrity-constraints">Integrity Constraints</a></li>
<li><a href="http://blog.dancrisan.com/intro-to-database-systems-part-14-to-16-triggers">Triggers</a></li>
<li><a href="http://blog.dancrisan.com/intro-to-database-systems-basic-perspectives-on-disk-and-buffer-management">On Disk and Buffer Management</a></li>
<li><a href="http://blog.dancrisan.com/intro-to-database-systems-indexing">Indexing</a></li>
<li><a href="http://blog.dancrisan.com/intro-to-database-systems-indexing-part-2-b-trees">B+ trees</a></li>
<li><a href="http://blog.dancrisan.com/intro-to-database-systems-concurrency-control-scheduling-problems">Concurrency Control - Scheduling problems</a></li>
<li><a href="http://blog.dancrisan.com/intro-to-database-systems-schema-refinement-functional-dependencies">Schema Refinement - Functional Dependencies</a></li>
</ul>
<p>If you would like to read those in a nicely formatted PDF or if you have any questions / suggestions / requests, feel free to send me a tweet <a href="https://twitter.com/dandancrisan">@dandancrisan</a>.</p>
tag:blog.dancrisan.com,2014:Post/intro-to-database-systems-schema-refinement-functional-dependencies2015-04-10T16:04:12-07:002015-04-10T16:04:12-07:00Intro to Database Systems : Schema Refinement - Functional Dependencies <p><strong>Schema refinement</strong> is just a fancy term for saying <strong>polishing tables</strong>. It is the last step before considering physical design/tuning with typical workloads:</p>
<ul>
<li>1) Requirement analysis : user needs</li>
<li>2) Conceptual design : high-level description, often using E/R diagrams</li>
<li>3) Logical design : from graphs to tables (relational schema)</li>
<li>4) <strong>Schema refinement</strong> : checking tables for redundancies and anomalies<br>
</li>
</ul>
<p>Let’s see an example of redundancies and anomalies. Consider the following table where the client’s name is the primary key. </p>
<p><a href="https://svbtleusercontent.com/hfmo7agmn9hq.png"><img src="https://svbtleusercontent.com/hfmo7agmn9hq_small.png" alt="first.PNG"></a></p>
<p>The table is presenting information on employees (sales reps) and their clients. </p>
<p>If we want to <strong>insert data</strong>, we notice that: </p>
<ul>
<li>each row requires an entry in the client field </li>
<li>we can’t insert data for newly hired sales reps until they’ve been assigned to one or more clients </li>
<li>if sales reps are in a training process, even if they’ve been already hired, they can’t actually join the database because they need to have a delegated client… unless “dummy” clients are created. </li>
</ul>
<p>If we want to <strong>update data</strong>, we notice that: </p>
<ul>
<li>the sales reps name is repeated for each client.</li>
<li>what if, for a given client, we misspelled the name of the sales reps Crosby instead of Cosby… how can we edit that without affecting all the sales reps called Crosby?</li>
</ul>
<p>If we want to <strong>delete data</strong>, what if Mary doesn’t have a client anymore because she’s taking a year off? We are forced to either </p>
<ul>
<li>create a dummy client</li>
<li>incorrectly showing her with a client she no longer handled </li>
<li>delete Mary’s record (even if however she’s still an employee)</li>
<li>notice we can not have “null” as a client since primary field keys cannot store null. </li>
</ul>
<p>When we have to treat with <strong>schema refinement</strong> we often notice that the main problem is <strong>redundancy</strong>. In order to identify schemas with such problems, we’ll introduce the notion of <strong>functional dependencies</strong>: a relationship that exists when one attribute uniquely determines another attribute. A <strong>functional dependency</strong> is simply a new type of constraint between two attributes. </p>
<p>Say that R is a relation with attributes X and Y, we say that there is a functional dependency X -> Y when <strong>Y is functionally dependent on X</strong> (where X is the <strong>determinant set</strong> and Y is the <strong>dependent attribute</strong>).</p>
<p>Let’s illustrate a scenario where the designer didn’t take in consideration dependencies between columns. </p>
<ul>
<li>Data (studID, studName, address, courseID, courseName, grade)</li>
</ul>
<p>The following structure is considerably better:</p>
<ul>
<li>Student(studID, studName, address)</li>
<li>Course (courseID, courseName)</li>
<li>Enrolled (studID, courseID, grade)</li>
</ul>
<p>How do we pass from one to the other? That’s what <strong>schema refinement</strong> does through <strong>functional dependencies</strong>. </p>
<p>A unique way to represent a student is through his studID. Each student has his own address, hence we can say that studID determines <em>address</em>. We’ll write this in the following way:</p>
<ul>
<li>studID - > address</li>
</ul>
<p>In the previous example, we actually have the following FDs:</p>
<ul>
<li>studID - > studName, address</li>
<li>courseID - > courseName</li>
<li>studID, courseID - > grade</li>
</ul>
<p>Let’s have a look at the properties of functional dependencies in the case where X, Y and Z are attributes belonging to a table R : </p>
<ul>
<li>
<strong>transitivity</strong>: if we assume that X - > Y and Y - > Z, then it’s clear that X - > Z<br>
</li>
<li>
<strong>reflexivity</strong>: if Y is a subset of X, then X -> Y</li>
<li>
<strong>augmentation</strong>: if X - > Y, then for any Z we’ll have X, Z - > Y, Z</li>
<li>
<strong>union</strong>: if X - > Y and Y - > Z, then X - > Y, Z</li>
<li>
<strong>decomposition</strong>: if X -> Y, Z then X - > Y and X - > Z</li>
</ul>
<p>The first 3 properties are called the <strong>Armstrong’s Axioms</strong>.</p>
<p>If F is a set of functional dependencies, F+ is the set of all FDs <strong>logically implied</strong> by F. <strong>Logically implied</strong> is just another way of saying <em>obtained from the properties of functional dependencies</em> ( the ones that we just enumerated). F+ is also called <strong>the closure of the set of functional dependencies</strong>. Is is the set of all dependencies logically implied by those present in F. </p>
<p>Let’s illustrate the usage of those properties with an example. If we have the following set of FDs, can we conclude that A - > H is logically implied?</p>
<ul>
<li>A - > B</li>
<li>A - > C</li>
<li>C, G - > H</li>
<li>C, G - > I</li>
<li>B - > H</li>
</ul>
<p>Let’s see which properties are applicable to our case:</p>
<ul>
<li>We know that, by the transitivity property, if X -> Y and Y - > Z then we have X -> Z . </li>
<li>In our case we have A - > B and B - > H. </li>
<li>Hence, by transitivity, A - > H is logically implied. </li>
</ul>
<p>Which other dependencies are part of the closure?</p>
<ul>
<li>CG -> HI by the union rule</li>
<li>AG -> I by noticing that A -> C holds, and then AG -> CG by the augmentation rule and then AG -> I by transitivity. </li>
</ul>
<p>Given a set of FDs, is there a faster way to compute if a dependency is logically implied? </p>
<p>Let’s see through an example how we can ask this question in multiple ways:</p>
<ul>
<li>Does F = {A - > B, B - > C, C D - > E} imply A - > E?</li>
<li>Is A - > E in the closure F+ ?</li>
<li>Is E in A+ ?</li>
</ul>
<p>Before going on with a linear time algorithm, we notice that we’ve introduced a new notion, A+. We call A+ the <strong>attribute closure of A</strong> with respect to F and it will help us figure out if A - > E is logically implied. </p>
<ul>
<li>1) Assume that we create a <strong>temporary attribute closure of A</strong> called TMP and that to begin, TMP = A (the input of the FDs that you want to verify)</li>
<li>2) Let’s consider the first given dependency of F, A - > B.</li>
<li>3) Is A in the <strong>TMP</strong>? Yes, since as stated previously TMP = A; we continue.</li>
<li>4) If we continue, we union B with the current TMP, A. What we obtain is the new TMP, AB (since A union B = AB).</li>
<li>5) We now consider the second given dependency, B - > C. </li>
<li>6) Is B in the <strong>TMP</strong>? Yes, since we now have AB in the TMP; we continue. </li>
<li>7) If we continue, we union C with the current TMP, AB. What we obtain is the new TMP, ABC (since AB union C = ABC).</li>
<li>8) We consider the 3rd given dependency, C D - > E.</li>
<li>9) Is CD in the <strong>TMP</strong>? No, since we only have ABC in the current TMP, hence we stop. </li>
<li>10) The <strong>attribute closure of A</strong> is then A+ = TMP = {A, B, C}</li>
</ul>
<p>Now, to check if A - > E is in the closure F+, we can conclude that since E is <strong>NOT</strong> in A+, then A - > E is <strong>NOT</strong> in F+. </p>
<p>We can generalize this into an algorithm:</p>
<ul>
<li>1) Consider the input of your FDs as the first element of your <strong>temporary attribute closure</strong> TMP.</li>
<li>2) Consider each dependency X - > Y of the given set of FDs </li>
<li>3) Is X part of <strong>TMP</strong>? If yes, continue to step 4. If no, continue to step 5.</li>
<li>4) Yes : Union TMP with Y.</li>
<li>5) No : Your attribute closure = TMP (your current temporary attribute closure from step 3). </li>
<li>Conclusion : if an attribute is in your attribute closure, then it’s logically implied (it’s part of the closure of the set of functional dependencies). </li>
</ul>
<p>Now that we know how to quickly verify if a dependency is <strong>logically implied</strong>… how do we find all the dependencies that are logically implied? Given a set of FDs F, how do we find its closure, F+ ?</p>
<p>Let’s go through an example again:</p>
<ul>
<li>Given F = { A - > B, B - > C}, compute F+</li>
</ul>
<p>The algorithm is pretty simple:</p>
<p>1) Build an empty matrix with all possible combinations of attributes as rows and columns</p>
<p><a href="https://svbtleusercontent.com/chexafblfidlg.png"><img src="https://svbtleusercontent.com/chexafblfidlg_small.png" alt="1.PNG"></a></p>
<p>2) Compute the attribute closures of all attribute combinations</p>
<p><a href="https://svbtleusercontent.com/jazyvmhyk1koow.png"><img src="https://svbtleusercontent.com/jazyvmhyk1koow_small.png" alt="2.PNG"></a></p>
<p>3) Fill the matrix from step 1) by putting a check mark when a row member Y (from the table defined in step 1) is part of a member of the attribute closure Y+ (from the table defined in step 2) .</p>
<p><a href="https://svbtleusercontent.com/brxnvpz6lyrzq.png"><img src="https://svbtleusercontent.com/brxnvpz6lyrzq_small.png" alt="3.PNG"></a></p>
<p>Let’s look at some examples </p>
<ul>
<li>row member A: A, B, C, AB, AC, BC and ABC are all attributes of the closure of A+, ABC.</li>
<li>row member BC. A is not a member of attribute closure (BC)+ : we don’t put a check mark because there is no A in (BC)+. However, we check B and C and BC.</li>
<li>row member C. A is not a member of (C+): we don’t check it because C+ contains only C. </li>
</ul>
<p>By having a check-mark at say the intersection of row A with column BC we mean that A - > BC is part of the closure F+. This is how we enumerate all the dependencies that are part of the closure F+. </p>
<p>Functional dependencies can also be used to find all the <strong>candidate keys</strong>. By definition, a candidate key is <em>a set of columns that can be uniquely used to identify a database record without any irrelevant/unrelated/superfluous data</em>. It is a reduction of the entire collection of attributes, hence a minimization.</p>
<p>Since we are talking about a minimal subset, we can start with the complete set of attributes and then, following functional dependencies, minimize the set until we reach the candidate keys (a set of attributes that can not be reduced). Let’s illustrate this once more by an example.</p>
<p>Say we have F = { A - > B, BC - > E and ED - > A}.</p>
<ul>
<li>1) We know that the set of all attributes is ABCDE.</li>
<li>2) Can we reduce the set by using the first given FD? If we follow A - > B, we can remove B from the main set because B depends on A, and ABCD already contains A, hence no need of any dependent superfluous attribute. We obtain ACDE.</li>
<li>3) Can we reduce the set by using the second FD? If we follow BC - > E, we can remove E from the main set because E depends on BC, and ABCDE already contains BC. We obtain ABCD.</li>
<li>4) Can we reduce the set by using the third FD? If we follow ED - > A, we can remove A from the main set because A depends on ED, and ABCDE already contains ED. We obtain BCDE.</li>
<li>5) We now have a new set of attributes : ACDE, ABCD and BCDE. Let’s call them X. </li>
<li>6) Can we simplify any attribute from X by using dependency A - > B ? We can remove B from ABCD because ABCD already contains A, and B depends on A: we obtain ACD. Can we do the same for BCDE? No, because BCDE doesn’t contain A.</li>
<li>7) Can we simplify any attribute from X by using BC - > E ? We can remove E from BCDE because BCDE already contains BC and and E depends on BC: we obtain BCD. Can we do the same for ACDE ? No, because ACDE doesn’t contain BC,</li>
<li>8) Can we simplify any attribute from X by using ED - > A ? We can remove A from ACDE because ACDE already contains CD and A depends on CD: we obtain CDE.</li>
<li>9) We now have a new set of attributes : ACD, BCD and CDE. Let’s call them Y. </li>
<li>10) Can we simplify any attribute from Y by using A - > B ? BCD can not be simplified because it doesn’t contain A, and the rest of attributes from Y don’t contain B.</li>
<li>11) Can we simplify any attribute from Y by using BC - > E ? CDE can not be simplified because it doesn’t contain BC, and the rest of attributes from Y don’t contain E. </li>
<li>12) Can we simplify any attribute from Y by using ED - > A ? ACD can not be simplified because it doesn’t contain ED, and the rest of attributes from Y don’t contain ED.</li>
<li>Conclusion: the functional dependencies from F can not be used to simplify the subsets from Y, hence they can not be more minimized. They are our candidate keys: ACD, BCD and CDE. </li>
</ul>
<p>We notice that functional dependencies help us structuring our tables around unique attributes, avoiding superfluous information. </p>
tag:blog.dancrisan.com,2014:Post/intro-to-database-systems-concurrency-control-scheduling-problems2015-04-07T16:24:29-07:002015-04-07T16:24:29-07:00Intro to Database Systems : Concurrency Control - Scheduling problems <p>In real life, users access a database concurrently.</p>
<p>Database access is done through transactions. What is <strong>a transaction</strong>?</p>
<ul>
<li>a unit of work that has to be treated as “a whole”</li>
<li>it has to happen in full or not at all</li>
</ul>
<p>A real life example of a transaction is money transfer:</p>
<ul>
<li>first, withdraw an amount X from account A</li>
<li>second, deposit to account B </li>
</ul>
<p>The previous operation has to succeed in full. You can not stop halfway. Database transactions work the same way. They ensure that, no matter what happens, manipulated data is treated atomically (you can never see “half a change”). </p>
<p>Atomicity is part of the ACID properties that a DBMS has to maintain:</p>
<ul>
<li>
<strong>Atomicity</strong>: either <strong>all</strong> actions from a transaction happen, or <strong>none</strong> happen</li>
<li>
<strong>Consistency</strong>: the database starts from a consistent state and ends in a consistent state </li>
<li>
<strong>Isolation</strong>: execution of one transaction is isolated from other transactions</li>
<li>
<strong>Durability</strong> : if a transaction commits, its effects persist in the database </li>
</ul>
<p>Now what can go wrong? </p>
<ul>
<li>If not scheduled properly, concurrent process may alter the <strong>isolation</strong> and <strong>consistency</strong> properties.</li>
</ul>
<p>Let’s imagine a problem where 2 users reserve a seat for a flight:</p>
<ul>
<li>customer 1 finds a seat empty</li>
<li>customer 2 finds the same seat empty</li>
<li>customer 1 reserves the seat</li>
<li>customer 2 reserves the seat</li>
</ul>
<p>Customer 1 will not be happy. This introduces the notion of <strong>serializability</strong>. There needs to be a <strong>concurrency control</strong> mechanism through a schedule. </p>
<p>A sequence of transactions executed chronologically is called a <strong>schedule</strong>. It is a representation of how a set of transactions are executed over time. It can contain the following actions: </p>
<ul>
<li>read R(X)</li>
<li>write W(X)</li>
<li>commit (after completing all its actions, all the operations should be done and recorded)</li>
<li>abort (after executing some actions, if we abort, none of the operations should be done/recorded)</li>
</ul>
<p>A commit or an abort is mandatory in order to have a <strong>complete schedule</strong>.</p>
<p>A <strong>serial schedule</strong> is a schedule without interleavings: all operations are executed consecutively. </p>
<p><strong>Conflicting operations</strong> are present in a schedule when those operations satisfy the following conditions: </p>
<ul>
<li>they have to belong to different transactions </li>
<li>they have to access the same data object X </li>
<li>at least one of the operations is a W(X) (write on X)</li>
</ul>
<p>Let’s see a couple of <strong>conflicting operations</strong>:</p>
<ul>
<li>
<strong>The Write-Read Conflict</strong> : reading uncommitted data</li>
<li>
<strong>The Read-Write Conflict</strong> : rereading data that has been altered since the first read. </li>
<li>
<strong>The Write-Write Conflict</strong> : losing updates</li>
</ul>
<p>The <strong>Write-Read Conflict</strong>, also called <strong>reading uncommitted data</strong> or <strong>dirty-read</strong> occurs when a transaction T2 tries to read a database object A, modified by a transaction T1 which hasn’t been committed. When T1 continues with its transaction, data of object A is inconsistent. The next picture helps illustrating the scenario:</p>
<p><a href="https://svbtleusercontent.com/cjhsjmyynxkosa.png"><img src="https://svbtleusercontent.com/cjhsjmyynxkosa_small.png" alt="conc4.png"></a></p>
<p>In other words, a <strong>dirty read</strong> is when a transaction is allowed to read data from a row that has been modified by another running transaction and that modification has not yet been committed.</p>
<p>The <strong>Read-Write Conflict</strong>, also called <strong>unrepeatable reads</strong>, occurs when a transaction T1 has to read twice a database object A. After the first read, transaction T1 waits for transaction T2 to finish. T2 overwrites object A and when T1 reads A again, there are 2 different versions of A. T1 will be forced to abort: it is the <strong>unrepeatable read</strong>. </p>
<p><a href="https://svbtleusercontent.com/iyedkalgihxc2w.png"><img src="https://svbtleusercontent.com/iyedkalgihxc2w_small.png" alt="conc44.png"></a></p>
<p>A real life example of this situation is when Bob and Alice are on Ticketmaster and they want to book tickets for a show. There is only one ticket left : Alice signs-in, finds that the ticket is expensive and takes the time to think about it… Bob signs-in and buys the ticket instantly and then logs off. Alice decides to buy the ticket and finds out that there are no tickets left. </p>
<p>The <strong>Write-Write Conflict</strong>, also called <strong>overwriting uncommitted data</strong>, occurs when there are lost updates. The attempt to make this scenario serial will always give two different results: either transaction T1’s version or transaction T2’s version. </p>
<p><a href="https://svbtleusercontent.com/obo2enfkrgksw.png"><img src="https://svbtleusercontent.com/obo2enfkrgksw_small.png" alt="conc444.png"></a></p>
<p>Once some concurrent transactions applied on a database, a schedule is <strong>serializable</strong> if the resulting database state is equivalent (equal) to the outcome of the same transactions, but executed sequentially, without overlapping in time. This is what we aim for. A schedule that is serializable can also be :</p>
<ul>
<li>ACA : avoid cascading abort</li>
<li>recoverable</li>
<li>strict schedule </li>
</ul>
<p><a href="https://svbtleusercontent.com/fajhbach2jwula.png"><img src="https://svbtleusercontent.com/fajhbach2jwula_small.png" alt="conc9.PNG"></a></p>
<p>The best way to verify if a schedule is serializable is through a <strong>dependency graph</strong>. </p>
<p>To build a dependency graph we can follow this procedure:</p>
<ul>
<li>1) Represent every transaction by a node</li>
<li>2) Is there a transaction Ty that reads an item after a different transaction Tx writes it? If yes, draw an edge from node Tx to node Ty.</li>
<li>3) Is there a transaction Ty that writes an item after a different transaction Tx reads it? If yes, draw an edge from node Tx to node Ty.</li>
<li>4) Is there a transaction Ty that writes an item after a different transaction Tx has written that item? If yes, draw an edge from node Tx to node Ty.</li>
</ul>
<p>Don’t forget to remove the edge that you just drew if you are actually aborting your transaction.</p>
<p>In order to have a serializable schedule, the dependency graph <strong>has to be acyclic</strong> (it doesn’t have any cycles, closed paths).</p>
<p>The following schedule is not serializable:</p>
<p><a href="https://svbtleusercontent.com/zwe64pgesffa.png"><img src="https://svbtleusercontent.com/zwe64pgesffa_small.png" alt="conc7.PNG"></a></p>
<p>The following schedule is serializable: </p>
<p><a href="https://svbtleusercontent.com/bd3mem7htjmhiw.png"><img src="https://svbtleusercontent.com/bd3mem7htjmhiw_small.png" alt="conc8.png"></a></p>
<p>Now how to know <strong>when a schedule is strict</strong>?</p>
<ul>
<li>when an object written by a transaction T cannot be read or written again until this transaction T commits or aborts. </li>
</ul>
<p>How to know <strong>when a schedule is avoiding cascading aborts</strong>?</p>
<ul>
<li>when an operation can only read data that has been committed</li>
</ul>
<p>How to know <strong>when a schedule is recoverable</strong>?</p>
<ul>
<li>when for each transaction where Ty reads some data written by Tx, the <strong>COMMIT</strong> operation of Tx appears before the <strong>COMMIT</strong> operation of Ty. </li>
</ul>
<p>The point of enumerating all those schedule classes is to define some concurrency control : measures such that non-serializable execution can never happen. </p>
tag:blog.dancrisan.com,2014:Post/intro-to-database-systems-indexing-part-2-b-trees2015-04-04T19:15:26-07:002015-04-04T19:15:26-07:00Intro to Database Systems : Indexing Part 2 - B+ trees<p>In the previous section, <a href="http://blog.dancrisan.com/intro-to-database-systems-indexing">Indexing Part 1</a>, we’ve seen that building an index for frequently used attributes considerably increases the efficiency of a query.</p>
<p>In this section we’ll discuss the most widely used index implementation: <strong>the B+ Tree</strong>.</p>
<p><a href="https://svbtleusercontent.com/cwdbtvkpqsmjbg.png"><img src="https://svbtleusercontent.com/cwdbtvkpqsmjbg_small.png" alt="b+.PNG"></a></p>
<p>Each <strong>node</strong> of a B+ tree is a page, a block of data. A page is the transfer unit to disk. </p>
<p>We are already aware that a table spans on many blocks of data. We can picture this by having the tree analogous to the table, the nodes analogous to the blocks of data and we have to keep in mind that each block of data contains multiple rows. </p>
<p>For now we’ve talked about two things stored on a disk : <strong>the index</strong> and <strong>the data</strong>. The index (in blue below) points at the data (in green below). We clearly notice now that creating an index takes extra space on the disk.</p>
<p><a href="https://svbtleusercontent.com/sl5xrq3eos2w.png"><img src="https://svbtleusercontent.com/sl5xrq3eos2w_small.png" alt="btree-index1.png"></a></p>
<p>Let’s analyze the leaves of a B+ tree. We notice that they are structured as a linked list with 2 pointers:</p>
<ul>
<li>one pointer towards the next node</li>
<li>one pointer towards the data. </li>
</ul>
<p>Having this tree structure (and not only a sequential linked list structure) helps for insertion and deletion complexities: they have a logarithmic running time.</p>
<p>Say we have a B+ tree with a height h = 2. In this case, 3 blocks of data will be accessed:</p>
<ul>
<li>the root</li>
<li>the leaf holding the pointers </li>
<li>the data page corresponding to the rows (referenced by the pointers from the leaf)<br>
</li>
</ul>
<p>How do we recognize B+ tree? </p>
<p>Let’s say <em>d</em> is the number of references that a node has to its children.</p>
<p>In order to be a valid B+ tree, it has to respect the following invariants:</p>
<ul>
<li>every <strong>leaf</strong> is at the <strong>same distance</strong> from the <strong>root</strong>
</li>
<li>if a node has <strong>d pointers</strong>, the node has to contain <strong>d-1 keys</strong>
</li>
<li>every <strong>root</strong> has <strong>at least 2 children</strong>
</li>
<li>every <strong>non-leaf</strong> AND <strong>non-root</strong> has <strong>at least d/2 children</strong>
</li>
<li>every <strong>leaf</strong> contains at least <strong>floor d/2 keys</strong>
</li>
<li>every <strong>key</strong> of the column <strong>appears in a leaf</strong>
</li>
</ul>
<p>Let’s see <strong>how does inserting nodes works in a B+ tree</strong>. Say we have a node X. The main algorithm is the following : </p>
<p><strong>Step 1:</strong> If node X has empty space, insert (key, ref) into the node.</p>
<p><strong>Step 2:</strong> If node X already full:</p>
<ul>
<li>
<strong>2A)</strong> split X into 2 nodes : X1 and X2</li>
<li>
<strong>2B)</strong> distribute keys evenly between 2 nodes</li>
<li>
<strong>2C)</strong> If node X is a leaf : take minimum value of 2nd node X2 and insert in the parent node by repeating the algorithm starting from point 1)</li>
<li>
<strong>2D)</strong> if node X is a non-leaf : take minimum value of 2nd node X2, exclude it from the split up and insert it in the parent node by repeating the algorithm starting from point 1)</li>
</ul>
<p>Let’s go through a few examples. Assume we have 4 rows per page and we’ve inserted the following set of key values : 2, 3, 5, 7, 11, 17, 19, 23, 29, 31.</p>
<p>An empty node with 4 rows per page will look like the following. We notice the 4 empty spaces at the edge for the 4 pointers : </p>
<p><a href="https://svbtleusercontent.com/8ne2fve1u0nn4w.png"><img src="https://svbtleusercontent.com/8ne2fve1u0nn4w_small.png" alt="emptyRow.PNG"></a></p>
<p>1) If we want to insert 2, and then 3, and then 5, we just follow the <strong>Step 1</strong> of the algorithm (node has empty space).</p>
<p><a href="https://svbtleusercontent.com/fq6jslidvs0dfw.png"><img src="https://svbtleusercontent.com/fq6jslidvs0dfw_small.png" alt="2)full node.PNG"></a> </p>
<p>2) Now let’s insert 7. </p>
<p><a href="https://svbtleusercontent.com/nzfagzchgfleq.png"><img src="https://svbtleusercontent.com/nzfagzchgfleq_small.png" alt="3).PNG"></a></p>
<p>What happened? </p>
<ul>
<li>
<strong>Step 2</strong>: we notice from part 1 that node X is already full. </li>
<li>
<strong>2A)</strong>: split node X into 2 nodes : X1 and X2 </li>
<li>
<strong>2B)</strong>: distribute the key evenly between 2 nodes (we have 2 and 3 in node X1 and 5 and 7 in node X2)</li>
<li>
<strong>2C)</strong> node X was indeed a leaf : the minimum value of 2nd node X2 is 5. We simply insert it into the parent node by repeating <strong>Step 1</strong> from the algorithm (because yes, the parent node has empty space) .</li>
<li>
<strong>Step 1 but at the parent’s node:</strong> there is space, stop.</li>
</ul>
<p>3) Let’s insert 11.</p>
<p><a href="https://svbtleusercontent.com/f7tbv5jtxey9ga.png"><img src="https://svbtleusercontent.com/f7tbv5jtxey9ga_small.png" alt="4).PNG"></a></p>
<p>What happened?</p>
<ul>
<li>
<strong>Step 1</strong>: we notice from part 2 that node X has space. </li>
</ul>
<p>4) Insert 17. </p>
<p><a href="https://svbtleusercontent.com/o3oabdbhvgh3sa.png"><img src="https://svbtleusercontent.com/o3oabdbhvgh3sa_small.png" alt="5).PNG"></a></p>
<p>What happened?</p>
<ul>
<li>we notice from part 3 that there is no more space, hence we continue to <strong>step 2</strong>
</li>
<li>
<strong>2A:</strong> we split the node X (containing 5, 7, 11) into 2 nodes</li>
<li>
<strong>2B:</strong> we distribute evenly between 2 nodes (we have 5, 7 in X1 and 11, 17 in X2)</li>
<li>
<strong>2C:</strong> X was a leaf, hence we take minimum value of 2nd node X2 (11) and insert it into the parent</li>
<li>
<strong>Step 1 but at the parent’s node:</strong> there is space, stop.</li>
</ul>
<p>5) Insert 19.</p>
<p><a href="https://svbtleusercontent.com/bo4nws85k2gzqq.png"><img src="https://svbtleusercontent.com/bo4nws85k2gzqq_small.png" alt="6).PNG"></a></p>
<p>What happened? </p>
<ul>
<li>
<strong>Step 1</strong>: there was space, we insert and then we stop.</li>
</ul>
<p>6) Insert 23.</p>
<p><a href="https://svbtleusercontent.com/wykgjbquowjjdg.png"><img src="https://svbtleusercontent.com/wykgjbquowjjdg_small.png" alt="7).PNG"></a></p>
<p>What happened?</p>
<ul>
<li>
<strong>Step 1</strong>: there is no more space, continue.</li>
<li>
<strong>Step 2A</strong>: we split the node X (containing 11, 17, 19) into 2 nodes</li>
<li>
<strong>Step 2B</strong>: we distribute evenly between 2 nodes </li>
<li>
<strong>Step 2C</strong>: X was a leaf, we take the minimum of 2nd node (19) and we insert it into the parent</li>
<li>
<strong>Step 1 but at the parent’s node:</strong> there is space, stop.</li>
</ul>
<p>7) Insert 29.</p>
<p><a href="https://svbtleusercontent.com/gzvmklj4fkw0w.png"><img src="https://svbtleusercontent.com/gzvmklj4fkw0w_small.png" alt="8).PNG"></a></p>
<p>What happened? </p>
<ul>
<li>
<strong>Step 1</strong>: there was space, we insert and then we stop.</li>
</ul>
<p>8) Insert 31:</p>
<p><a href="https://svbtleusercontent.com/5xsk6qb40bxlkw.png"><img src="https://svbtleusercontent.com/5xsk6qb40bxlkw_small.png" alt="9).PNG"></a></p>
<p>What happened?</p>
<ul>
<li>
<strong>step 1</strong> there is no space, continue</li>
<li>
<strong>Step 2A</strong>: we split the node X (containing 19, 23, 29) into 2 nodes</li>
<li>
<strong>Step 2B</strong>: we distribute evenly between 2 nodes</li>
<li>
<strong>Step 2C</strong>: X was a leaf, we take the minimum of 2nd node (29) and we insert it into the parent</li>
<li>
<strong>Step 1 but at the parent’s node:</strong> there is no space, continue.</li>
<li>
<strong>Step 2A:</strong> we split the new node X (containing 5, 11, 19) into 2 nodes</li>
<li>
<strong>Step 2B:</strong> we distribute evenly between 2 nodes</li>
<li>
<strong>Step 2D:</strong> since the new node X was not a leaf, we exclude the minimum value of the 2nd node (19) and we insert it into the new parent</li>
<li>
<strong>Step 1 but at the parent’s node:</strong> there is space, stop.</li>
</ul>
<p><a href="http://goneill.co.nz/btree-demo.php">This simulator</a> is pretty neat for testing your own implementations of B+ trees (many thanks to Joy & Graham from New Zealand). </p>
<p>Now why are we using B+ trees? </p>
<p>We notice that, unlike traversing a linked list, accessing any part of the tree requires visiting only a few nodes. Also, increasing the number of child nodes is decreasing the depth of the tree, hence decreasing the number of “hops” (time consuming disk reads) required. </p>
tag:blog.dancrisan.com,2014:Post/intro-to-database-systems-indexing2015-04-02T16:18:13-07:002015-04-02T16:18:13-07:00Intro to Database Systems : Indexing<p>We learned in the last lecture that when data is stored on disks, it is sorted as a set of blocks of data (also called pages). A block is accessed as a whole, in its entirety. On the disk, blocks are structured as link lists:</p>
<ul>
<li>they both have a section containing data</li>
<li>they both have a pointer to the location of the next node (next block/page)</li>
</ul>
<p>We will demonstrate how useful indexes are through a bunch of examples. An index helps us to find rows faster. They are useful for queries done on attributes that are used frequently. <strong>Indexing</strong> is just a fancy word to say “sorting a column in order to efficiently query an element” </p>
<p>Let’s start with some examples and define N as the number of blocks that the entire table requires.</p>
<p>We already know that searching on a column that <strong>isn’t sorted</strong> requires N/2 block accesses using <strong>Linear Search</strong>. Even worst, if the column doesn’t contain unique entries (say we have 2 people with their firstName “Dan”)… the entire table must be searched. That’s N block accesses (because what if the duplicated element is the last row in the column).</p>
<p>Now let’s assume that the column is sorted (and that’s what an index does). By using <strong>Binary Search</strong>, we will obtain log2 N block accesses. Since the data is sorted, we won’t need to search the rest of the table for duplicate values. </p>
<p>Creating an index is basically creating a data structure that holds the column value and a pointer to the records. </p>
<p>Let’s consider a database that doesn’t have an index. For simplicity, we have a table with only two columns: firstName, lastName. </p>
<p>Say we have:</p>
<ul>
<li>r = 5 000 000 as the number of rows in the table</li>
<li>R = 204 bytes as the fixed size of each row (record length)</li>
<li>B = 1024 bytes as the default block size (size of each data block)</li>
</ul>
<p>Let’s see how many rows are in a disk block?</p>
<ul>
<li>1024 / 204 = 5 rows per disk block</li>
</ul>
<p>Let’s see how many blocks are in our table?</p>
<ul>
<li>5 000 000 / 5 = 1 000 000 blocks per table . </li>
</ul>
<p>This is our N. We know that if we query on a non-sorted column, we’ll obtain N / 2 blocks traversed, hence the traversal of 1 000 000 / 2 = 500 000 blocks. If we allow duplicates, we will have 1 000 000 block accesses. </p>
<p>If the column is already sorted and then we search for an element, we obtain log2 (1 000 000) = 20 block accesses.</p>
<p>We notice that, from 500 000 block accesses to 20 block accesses, the performance increase is substantial. </p>
<p>Since we’ve seen what’s the impact of a query on a sorted column, let’s introduce an index. Let’s pretend we have <strong>firstName</strong> as an attribute. As we said, creating an index on a column implies creating a data structure that holds:</p>
<ul>
<li>a value : in this case the field name takes 50 bytes.</li>
<li>a pointer to the record it relates to : row pointer is 4 bytes<br>
</li>
</ul>
<p>Say we have:</p>
<ul>
<li>r = 5 000 000 as the number of rows in the table</li>
<li>R = 54 bytes as the index record length</li>
<li>B = 1024 bytes as the default block size (size of data block)</li>
</ul>
<p>Let’s see how many rows are in a disk block?</p>
<ul>
<li>1024 / 54 = 18 rows per disk block</li>
</ul>
<p>Let’s see how many blocks are in our table?</p>
<ul>
<li>5 000 000 / 18 = 277 778 blocks</li>
</ul>
<p>This is the number of blocks that needs to be accessed in a non-sorted column when we need to search a particular row. </p>
<p>Since we’ve used an index, the column is already sorted. Hence, when we query on it, we binary search through the index with an average of log2 (277 778) = 19 block accesses. The last step is to follow the pointer, hence 19 + 1 = 20 block accesses to find a particular element in an indexed column. </p>
<p>Again, we notice that from 277 778 block accesses to 20 block accesses, the performance increase is substantial.</p>
<p>Now why can’t we use the sorting method and then a search instead of searching trough an indexed column? </p>
<p>Don’t forget that the sorting method actually makes changes to the underlying physical order of data. Indexing creates a separate index file that references rows in the active table, allowing direct access to those rows through a data structure, a B+ tree that we will introduce it in the next chapter.</p>
<p>Given that creating an index requires additional disk space (in our example, 277 778 extra blocks), this is potentially the main drawback: we win on time complexity, but we lose on space complexity. </p>
tag:blog.dancrisan.com,2014:Post/intro-to-database-systems-basic-perspectives-on-disk-and-buffer-management2015-03-26T17:04:28-07:002015-03-26T17:04:28-07:00Intro to Database Systems : Basic Perspectives on Disk and Buffer Management <p>The Database Management System stores information at 3 levels of the <strong>memory hierarchy</strong>:</p>
<ul>
<li>Primary storage - <strong>main memory</strong> (and cache) : for currently used data, it is fast and usually volatile.</li>
<li>Secondary storage - <strong>magnetic (“hard”) disk</strong> : for persistent data, it is relatively slow and nonvolatile. It stores the main database. </li>
<li>Tertiary storage - <strong>tape</strong> : nonvolatile older version of the data. </li>
</ul>
<p>Now why can’t we store everything in the main memory, if it’s the fastest way? Because …</p>
<ul>
<li>it costs too much : with 100$ you buy 4GB of RAM, but 2000GB of disk (500 times more)</li>
<li>it is volatile : we want to save data between runs, not only at run time! </li>
</ul>
<p>Why can’t we store everything on tape?</p>
<ul>
<li>disks use <strong>random access</strong> vs. <strong>sequential</strong>
</li>
</ul>
<p><strong>Disk blocks</strong> or <strong>pages</strong> are the main units for measuring retrieved data. They have a fixed usable size, usually being 512 bytes. We can <strong>read</strong> (from disk to RAM) or <strong>write</strong> (from RAM to disk) pages. </p>
<p>The <strong>seek time</strong> is the most time consuming operation when accessing data on disk (from 1 to 20 msec). To compare, accessing data from the main memory is in the order of nanoseconds. </p>
<p>Lowest layers of the Database Management System are in charge of how the place is used on the disk. Higher levels depend on a buffer (the lowest layer of DBMS) to:</p>
<ul>
<li>allocate/de-allocate a block of memory (page)</li>
<li>read/write a block of memory (page)</li>
</ul>
<p>In other words, the <strong>buffer manager</strong> is doing 3 things:</p>
<ul>
<li>1) manages the functions for reading data that’s in the RAM</li>
<li>2) indexes pages (disk blocks) from the database into the buffer cache (also called the <strong>buffer pool</strong>)</li>
<li>3) writes modified pages back to the disk. </li>
</ul>
<p>When data has to be loaded from the disk:</p>
<ul>
<li>if there is empty frame available in the pool, the buffer manager picks an empty frame</li>
<li>if there is no empty frame in the pool, the buffer manager picks a frame <strong>for replacement</strong>
</li>
</ul>
<p>Replacement frames have a <strong>pin counter</strong> of 0. Once the page from the frame is loaded, the <strong>pin counter</strong> becomes 1, the equivalent of an empty frame. If there is an empty frame, the buffer manager picks an empty frame. </p>
<p>Once the data contained in the <strong>replacement frame</strong> is requested, its pin counter is incremented. We can state this as a general rule: “when requesting a page that is already in the buffer, its pin counter is incremented”. After the operation is finished, we decrement it. If the page (the disk block) is modified, a dirty bit is set and the frame is immediately written to the disk (update). </p>
<p>If there are no empty frames, only unpinned pages (pin counter = 0) can be chosen to accept loaded pages from the disk: this is the <strong>replacement policy</strong>. (This makes sense: if we reached the point where we have only non-empty frames, it means that we will have to wait for all the transactions to finish. We will have to wait for all the pins to be decremented. Hence, the pin counter has to get back to 0, or become 1 and unlock a frame.) </p>
<p>DBMS maintain their own buffer rather than use that of the OS so that they control when to let out pages from it through the implementation of <strong>pin counters</strong> and <strong>replacement policies</strong>. </p>
<p><a href="https://svbtleusercontent.com/3joqgx8byxjxq.png"><img src="https://svbtleusercontent.com/3joqgx8byxjxq_small.png" alt="mem.PNG"></a></p>
tag:blog.dancrisan.com,2014:Post/intro-to-database-systems-part-14-to-16-triggers2015-03-19T18:09:47-07:002015-03-19T18:09:47-07:00Intro to Database Systems - Part 14 to 16 : Triggers<p>A <strong>trigger</strong> is a procedure that executes automatically as soon as specified changes occur in the DBMS. </p>
<p>A trigger has 3 parts:</p>
<ul>
<li>
<strong>an event</strong> : at what type of change the procedure should happen? Usually it happens before/after/insteadOf an insert/update/delete.</li>
<li>
<strong>an action</strong> : what happens if the trigger runs? (example: add student to scholarshipStudList)</li>
<li>
<strong>a condition</strong>: under which condition the procedure gets executed once the event triggered? In other words, when does the event gets executed? (example: add student to scholarshipStudList <strong>only when</strong> studGPA > 3.6).</li>
</ul>
<p>Let’s start by creating 4 tables and see how a trigger affects them: </p>
<ul>
<li>CREATE TABLE test1(a1 INT);</li>
<li>CREATE TABLE test2(a2 INT);</li>
<li>CREATE TABLE test3(a3 INT NOT NULL AUTO_INCREMENT PRIMARY KEY);</li>
<li>CREATE TABLE test4(
a4 INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
b4 INT DEFAULT 0
);</li>
</ul>
<p>Now let’s define a trigger on <strong>table test1</strong>:</p>
<ul>
<li>CREATE TRIGGER testReference BEFORE INSERT ON test1
FOR EACH ROW
BEGIN
INSERT INTO test2 SET a2 = NEW.a1;
DELETE FROM test3 WHERE a3 = NEW.a1;
UPDATE test4 SET b4 = b4 + 1 WHERE a4 = NEW.a1;
END;</li>
</ul>
<p>We notice that we have two parts.</p>
<p>1) An event part: at what type of change should the event happen? </p>
<ul>
<li>BEFORE INSERT ON test1</li>
</ul>
<p>2) An action part: what happens if the trigger runs? </p>
<ul>
<li>BEGIN
INSERT INTO test2 SET a2 = NEW.a1;
DELETE FROM test3 WHERE a3 = NEW.a1;
UPDATE test4 SET b4 = b4 + 1 WHERE a4 = NEW.a1;
END;</li>
</ul>
<p>We notice that:</p>
<ul>
<li>the action part is surrounded by the keywords <strong>BEGIN</strong> and <strong>END</strong> : we are using them to encapsulate more than one action.</li>
<li>there is no condition part. This parameter is optional. </li>
</ul>
<p>Now let’s populate the databases and see how they look for now, before touching table <strong>test1</strong>:</p>
<ul>
<li>INSERT INTO test3 (a3) VALUES
(NULL), (NULL), (NULL), (NULL), (NULL),
(NULL), (NULL), (NULL), (NULL), (NULL);</li>
<li>INSERT INTO test4 (a4) VALUES
(0), (0), (0), (0), (0), (0), (0), (0), (0), (0);</li>
</ul>
<p><a href="https://svbtleusercontent.com/a0gpcr1uxzjkbg.png"><img src="https://svbtleusercontent.com/a0gpcr1uxzjkbg_small.png" alt="dbt.PNG"></a></p>
<p>Let’s modify table test1 and see how the tables are looking like now:</p>
<ul>
<li>INSERT INTO test1 (a1) VALUES (1), (3), (1), (7), (1), (8), (4), (4);</li>
</ul>
<p>Here is how the trigger updates the data of the table before the insertion:</p>
<p>1) INSERT INTO test2 SET a2 = NEW.a1;</p>
<ul>
<li>a2 became a copy of the NEW table a1</li>
</ul>
<p>2) DELETE FROM test3 WHERE a3 = NEW.a1;</p>
<ul>
<li>we’ve deleted all the values from a3 that are the same as values from the new table a1. All the different values remain. (1 is present in a1, we delete it; 2 isn’t present, we keep it; 3 is present, as well as 4 and 7 and 8. We delete them)</li>
</ul>
<p>3) UPDATE test4 SET b4 = b4 + 1 WHERE a4 = NEW.a1;</p>
<ul>
<li>we are updating b4 every time (for each row) that we find a1 = a4. (For 1, a1 = a4 three times: increment b4 three times; For 2, there is no value 2 in a1: don’t increment b4; For 3, we find it once in a1: increment once; For 4, we find it twice: increment twice.)</li>
</ul>
<p><a href="https://svbtleusercontent.com/0hfeygeld9wueq.png"><img src="https://svbtleusercontent.com/0hfeygeld9wueq_small.png" alt="db2.PNG"></a></p>
<p>We have 2 types of actions performed:</p>
<ul>
<li>
<strong>FOR EACH STATEMENT</strong> : triggered once the entire statement is executed, independently of the number of rows affected.</li>
<li>
<strong>FOR EACH ROW</strong> : triggered when rows of a table are modified, it is fired as many times the rows are modified. </li>
</ul>
<p>Some triggers are performed on a <strong>WHEN</strong> condition, delimiting the new and old data by using the NEW or OLD keyword:</p>
<ul>
<li>CREATE TRIGGER ratingIncrease AFTER UPDATE OF rating on Skaters REFERENCING OLD AS o NEW AS n FOR EACH ROW (WHEN (n.rating > 1 + o.rating) UPDATE Skaters SET rating = 1 + o.rating WHERE sid = n.sid</li>
</ul>